Moduler
Alle tilgjengelige ekstraksjonmoduler. Legg til nye ved å opprette en mappe i src/modules/
OpenDataLoader
v1.0.0Uses OpenDataLoader PDF (fast local mode) for structured layout analysis. Extracts headings, paragraphs, tables, lists with bounding boxes. Requires Java 11+.
av Effara
opendataloaderlocalfastdeterministic
Marker
v1.0.0Uses marker-pdf for deep-learning-based PDF conversion. Accurate layout detection, table recognition, OCR, and equation handling. Requires Python 3.10+ and marker-pdf.
av Effara
markerlocaldeep-learningocr
MinerU
v1.0.0Uses MinerU CLI for structured PDF extraction — layout analysis, formula-to-LaTeX, table-to-HTML, OCR. Requires Python 3.10+ and mineru installed.
av Effara
minerupythonocrlayout
Docling
v1.0.0Uses IBM Docling for document conversion with advanced layout analysis, table extraction, and OCR. Requires Python 3.10+ and docling (pip install docling).
av Effara
doclinglocaldeep-learningocrtables