OpenDataLoader LogoOpenDataLoader

Extraction Speed

Measures processing speed per document

Why Speed Matters

Processing time directly impacts cost and user experience. A 10x slower parser means 10x more compute cost at scale — or unacceptable wait times for interactive applications.

What We Measure

Average seconds per page across the benchmark corpus, covering the full pipeline: PDF parsing, layout analysis, and Markdown generation.

Extraction speed

Results

EngineSpeed (s/page)Rank
Nutrient0.008#1
OpenDataLoader0.015#2
Edgeparse0.036#3
Unstructured0.077#4
PyMuPDF4LLM0.091#5
MarkItDown0.114#6
OpenDataLoader [hybrid]0.463#7
Docling0.762#8
LiteParse1.061#9
Unstructured [hi_res]3.008#10
MinerU5.962#11
Marker53.932#12

When to Prioritize Speed

Use CaseRecommended Engine
Batch processing (1000s of docs)OpenDataLoader
Real-time / interactive appsOpenDataLoader or MarkItDown
Cost-sensitive deploymentsOpenDataLoader
Accuracy-critical, time flexibleDocling

Notes

  • Measurements are single-threaded on CPU
  • Multi-threading and GPU acceleration can change rankings
  • All engines run locally — no network latency

Learn More

For detailed methodology, raw data, and reproduction scripts, see the opendataloader-bench repository.



On this page