Heading Levels (MHS)

Why Heading Structure Matters for RAG

Headings define document hierarchy — chapters, sections, subsections. RAG systems use this structure to create meaningful chunks and understand context. If headings are missed or mis-leveled, chunks lose their semantic boundaries.

Example problem: A user asks about "Section 3.2" but the parser didn't detect it as a heading, so the RAG system can't locate that section.

What MHS Measures

MHS (Markdown Heading Similarity) compares detected headings and their levels against ground truth. A score of 1.0 means all headings were correctly identified with proper hierarchy; lower scores indicate missed or incorrectly leveled headings.

Heading levels

Results

Engine	Score	Rank
Docling	0.824	#1
OpenDataLoader [hybrid]	0.821	#2
Nutrient	0.819	#3
Marker	0.796	#4
Unstructured [hi_res]	0.749	#5
MinerU	0.743	#6
OpenDataLoader	0.739	#7
Edgeparse	0.706	#8
PyMuPDF4LLM	0.412	#9
Unstructured	0.388	#10
MarkItDown	0.000	#11
LiteParse	0.000	#11

ML-based engines (Docling) outperform rule-based engines for heading detection
MarkItDown and LiteParse don't extract heading levels at all

When to Prioritize This Metric

Use Case	Recommended Engine
Long documents with deep hierarchy	Docling
Legal documents, technical manuals	Docling
Semantic chunking by section	Docling or OpenDataLoader
Simple documents, flat structure	Any engine works

Trade-offs

Higher heading accuracy comes with slower processing. Docling scores 0.80 but takes 16x longer than OpenDataLoader. If your documents have simple structure, speed may matter more.

Learn More

For detailed methodology, raw data, and reproduction scripts, see the opendataloader-bench repository.