AI text generation should create valid XML even if ALTO has angle brackets
Opened this issue · 0 comments
benwbrum commented
One of the USDA's pages generates AI text with invalid XML mark-up. This seems to be generated by Transkribus during the HTR generation process, creating strange <INS>
and <GAP>
tags in the ALTO. When these are inserted into the transcription field, they result in invalid XML.
We should escape the XML generated by ALTO.