UnorderedGroup
mikegerber opened this issue · 0 comments
mikegerber commented
@cneud reported problems with the ENP dataset. Example files:
The GT file contains an UnorderedGroup
which triggers an NotImplementedError
:
% dinglehopper 00008061.gt.xml 00008061.eng.xml
Traceback (most recent call last):
File "/home/mike/.virtualenvs/dinglehopper-github/bin/dinglehopper", line 11, in <module>
load_entry_point('dinglehopper', 'console_scripts', 'dinglehopper')()
File "/home/mike/.virtualenvs/dinglehopper-github/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/mike/.virtualenvs/dinglehopper-github/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/mike/.virtualenvs/dinglehopper-github/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mike/.virtualenvs/dinglehopper-github/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/mike/devel/dinglehopper-github/qurator/dinglehopper/cli.py", line 180, in main
process(gt, ocr, report_prefix, metrics=metrics, textequiv_level=textequiv_level)
File "/home/mike/devel/dinglehopper-github/qurator/dinglehopper/cli.py", line 93, in process
gt_text = extract(gt, textequiv_level=textequiv_level)
File "/home/mike/devel/dinglehopper-github/qurator/dinglehopper/ocr_files.py", line 155, in extract
return page_extract(tree, textequiv_level=textequiv_level)
File "/home/mike/devel/dinglehopper-github/qurator/dinglehopper/ocr_files.py", line 79, in page_extract
raise NotImplementedError
NotImplementedError
- Make this a warning and read
UnorderedGroup
s in XML order - Check what other tools do with this
- Find a proper solution (Hard!)