Moduler

Alle tilgjengelige ekstraksjonmoduler. Legg til nye ved å opprette en mappe i src/modules/

Module Guide
OpenDataLoader
v1.0.0

Uses OpenDataLoader PDF (fast local mode) for structured layout analysis. Extracts headings, paragraphs, tables, lists with bounding boxes. Requires Java 11+.

av Effara
opendataloaderlocalfastdeterministic
Marker
v1.0.0

Uses marker-pdf for deep-learning-based PDF conversion. Accurate layout detection, table recognition, OCR, and equation handling. Requires Python 3.10+ and marker-pdf.

av Effara
markerlocaldeep-learningocr
MinerU
v1.0.0

Uses MinerU CLI for structured PDF extraction — layout analysis, formula-to-LaTeX, table-to-HTML, OCR. Requires Python 3.10+ and mineru installed.

av Effara
minerupythonocrlayout
Docling
v1.0.0

Uses IBM Docling for document conversion with advanced layout analysis, table extraction, and OCR. Requires Python 3.10+ and docling (pip install docling).

av Effara
doclinglocaldeep-learningocrtables