Warn if there is text missing in the ReadingOrder
mikegerber opened this issue · 1 comments
mikegerber commented
For 00451941.gt.xml, dinglehopper-extract
does not extract the header's text DE L'ESPRIT DE L'HOMME
.
mikegerber commented
The header is in TextRegion
r3
, but the ReadingOrder
only includes the main text in r1
, so dinglehopper does only extract the main text. This means: The file is buggy, not dinglehopper.
However, we can do better by warning that any region is not included in the extracted text.