Regression with newest ocrd version
Closed this issue · 4 comments
---------------------------------------- Captured stderr call ----------------------------------------
12:34:41.863 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-dinglehopper'
Traceback (most recent call last):
File "/home/b-mg106/.pyenv/versions/3.12.0/envs/tmp.dinglehopper.2023-10-23.issue-88-multimethod-dep/lib/python3.12/site-packages/ocrd/processor/helpers.py", line 131, in run_processor
processor.process()
File "/home/b-mg106/devel/dinglehopper/src/dinglehopper/ocrd_cli.py", line 41, in process
gt_file = self.workspace.download_file(gt_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/b-mg106/.pyenv/versions/3.12.0/envs/tmp.dinglehopper.2023-10-23.issue-88-multimethod-dep/lib/python3.12/site-packages/ocrd/workspace.py", line 206, in download_file
raise ValueError("OcrdFile {f} has neither 'url' nor 'local_filename', so cannot be downloaded")
ValueError: OcrdFile {f} has neither 'url' nor 'local_filename', so cannot be downloaded
----------------------------------------- Captured log call ------------------------------------------
ERROR ocrd.processor.helpers.run_processor:helpers.py:133 Failure in processor 'ocrd-dinglehopper'
Traceback (most recent call last):
File "/home/b-mg106/.pyenv/versions/3.12.0/envs/tmp.dinglehopper.2023-10-23.issue-88-multimethod-dep/lib/python3.12/site-packages/ocrd/processor/helpers.py", line 131, in run_processor
processor.process()
File "/home/b-mg106/devel/dinglehopper/src/dinglehopper/ocrd_cli.py", line 41, in process
gt_file = self.workspace.download_file(gt_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/b-mg106/.pyenv/versions/3.12.0/envs/tmp.dinglehopper.2023-10-23.issue-88-multimethod-dep/lib/python3.12/site-packages/ocrd/workspace.py", line 206, in download_file
raise ValueError("OcrdFile {f} has neither 'url' nor 'local_filename', so cannot be downloaded")
ValueError: OcrdFile {f} has neither 'url' nor 'local_filename', so cannot be downloaded
pytest -k integ_ocrd_cli
METS' fileSec
looks like this:
<mets:fileSec>
<mets:fileGrp USE="OCR-D-GT-PAGE">
<mets:file MIMETYPE="application/xml" ID="OCR-D-GT-PAGE_00000024">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-GT-PAGE/00000024.page.xml"/>
</mets:file>
</mets:fileGrp>
<mets:fileGrp USE="OCR-D-OCR-CALAMARI">
<mets:file MIMETYPE="application/vnd.prima.page+xml" ID="OCR-D-OCR-CALAMARI_0001">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-OCR-CALAMARI/OCR-D-OCR-CALAMARI_0001.xml"/>
</mets:file>
</mets:fileGrp>
<mets:fileGrp USE="OCR-D-OCR-TESS">
<mets:file MIMETYPE="application/vnd.prima.page+xml" ID="OCR-D-OCR-TESS_0001">
<mets:FLocat xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="OCR-D-OCR-TESS/OCR-D-OCR-TESS_0001.xml"/>
</mets:file>
</mets:fileGrp>
</mets:fileSec>
This used to work.
Maybe it's because xlink:href
isn't really an URL? Or is it?
ocrd_model's ocrd_file.py looks like this is supposed to also have a LOCTYPE
and OTHERLOCTYPE
.
Our other "standard"/commonly used example files have the LOCTYPE, I'm trying those. The embedded test data may just be invalid and have been handled more graceful in earlier ocrd versions.
https://qurator-data.de/examples/actevedef_718448162.first-page+binarization+segmentation.zip
has LOCTYPE
https://qurator-data.de/examples/actevedef_718448162.zip
has LOCTYPE
https://qurator-data.de/examples/actevedef_718448162.first-page.zip
has LOCTYPE
Adding LOCTYPE
/OTHERLOCTYPE
to the test data fixes the tests.
I'll commit the fix but leave this open until I can discuss it with @kba as I'm not sure if it's a regression in core/something that could conveniently be handled by core etc.
This was probably encountered elsewhere too, Closing.