dbmdz/solr-ocrhighlighting

Failed to parse the OCR markup

Closed this issue · 0 comments

My name is Michael Hoppe, i am Backend-Developer for the Newspaper-Project of deutsche-digitale-bibliothek.de. In this project we load Newspaper-Issues and index the Fulltext(ALTO) with your plugin into a SOLR-Index. Currently we use version 0.7
My Question:
At the Moment i get an error with several of our ALTOS from SOLR and we cannot index the Fulltext. The ALTO-File is valid xml and looks for me (at least at the region where the error occurs) valid. I attached an example-ALTO-File (test.txt, as xml is not uploadable here) , the logfile when executing de.digitalcollections.solrocr.solr.DistributedTest.java (log.txt) and a picture with text marked where the error occurs (in my opinion), could be the marked element or one of the next elements.
test.txt
log.txt
image