ocrd-import does not work
Closed this issue · 4 comments
root@serv50:/home/jb/wm516# docker-ocrd ocrd-import img
09:52:27.585 INFO ocrd.ocrd-import - analysing 'img'
09:52:28.219 INFO ocrd.resolver.workspace_from_nothing - Writing METS to /data/
►img/mets.xml
09:52:28.343 INFO ocrd.ocrd-import - adding -g p0001 -G OCR-D-IMG -m image/tiff
► -i OCR-D-IMG_f1 '1.tif'
09:52:28.962 INFO ocrd.cli.workspace.bulk-add - [ 1/1] OCR-D-IMG image/tiff p
►0001 OCR-D-IMG_f1 1.tif
09:52:29.080 INFO ocrd.ocrd-import - Success on 'img'
root@serv50:/home/jb/wm516# docker-ocrd ocrd-anybaseocr-dewarp -I OCR-D-IMG -O
► OCR-D-001 -P model_path latest_net_G.pth
Traceback (most recent call last):
File "/usr/local/bin/ocrd-anybaseocr-dewarp", line 33, in <module>
sys.exit(load_entry_point('ocrd-anybaseocr', 'console_scripts', 'ocrd-
►anybaseocr-dewarp')())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in __
►call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in
► main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in
► invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in
► invoke
return __callback(*args, **kwargs)
File "/build/ocrd_anybaseocr/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py",
► line 212, in cli
return ocrd_cli_wrap_processor(OcrdAnybaseocrDewarper, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/ocrd/decorators/__init__.py",
► line 109, in ocrd_cli_wrap_processor
raise Exception("Invalid input/output file grps:\n\t%s" % '\n\t'.join(report
►.errors))
Exception: Invalid input/output file grps:
→Input fileGrp[@USE='OCR-D-IMG'] not in METS!
root@serv50:/home/jb/wm516# cat mets.xml
<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/
►1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:
►schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/
►premis-v2-0.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/
►mods-3-6.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd
► http://www.loc.gov/mix/v10 http://www.loc.gov/standards/mix/mix10/mix10.xsd">
<mets:metsHdr CREATEDATE="2024-03-20T09:51:19.438874">
<mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="CREATOR">
<mets:name>ocrd/core v2.63.3</mets:name>
</mets:agent>
</mets:metsHdr>
<mets:dmdSec ID="DMDLOG_0001">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:amdSec ID="AMD">
</mets:amdSec>
<mets:fileSec>
</mets:fileSec>
<mets:structMap TYPE="PHYSICAL">
<mets:div TYPE="physSequence">
</mets:div>
</mets:structMap>
</mets:mets>
That's odd. Just to be sure: you did use the same workspace directory img
in both cases (just did not report it that way), right?
Next question would be: which version are you using? (To be safe: does md5sum /usr/local/bin/ocrd-import
produce 817a92bcfdbc45c020b47534f2074895, the one from current master?)
Next best thing is diagnostics: could you try to change your ~/ocrd_logging.conf (or if it does not exist, in the Docker image modify /etc/ocrd_logging.conf such that either the root logger has level=DEBUG
, or there exists a logger for qualname=ocrd
with propagate=1
and level=DEBUG
) and try again (i.e. rm the mets.xml and rerun ocrd-import).
Question 1 (did procedure again, no editing) & 2:
root@serv50:/home/jb/test# mkdir OCR-D-IMG
root@serv50:/home/jb/test# cp ../wm516/OCR-D-IMG/1.tif OCR-D-IMG
root@serv50:/home/jb/test# docker-ocrd ocrd workspace init
11:28:03.566 INFO ocrd.resolver.workspace_from_nothing - Writing METS to /data/
►mets.xml
/data
root@serv50:/home/jb/test# docker-ocrd ocrd-import OCR-D-IMG/
11:28:17.896 INFO ocrd.ocrd-import - analysing 'OCR-D-IMG/'
11:28:18.530 INFO ocrd.resolver.workspace_from_nothing - Writing METS to /data/
►OCR-D-IMG/mets.xml
11:28:18.653 INFO ocrd.ocrd-import - adding -g p0001 -G OCR-D-IMG -m image/tiff
► -i OCR-D-IMG_f1 '1.tif'
11:28:19.279 INFO ocrd.cli.workspace.bulk-add - [ 1/1] OCR-D-IMG image/tiff p
►0001 OCR-D-IMG_f1 1.tif
11:28:19.395 INFO ocrd.ocrd-import - Success on 'OCR-D-IMG/'
root@serv50:/home/jb/test# cat mets.xml
<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/
►1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:
►schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/
►premis-v2-0.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/
►mods-3-6.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd
► http://www.loc.gov/mix/v10 http://www.loc.gov/standards/mix/mix10/mix10.xsd">
<mets:metsHdr CREATEDATE="2024-03-20T11:28:03.566113">
<mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="CREATOR">
<mets:name>ocrd/core v2.63.3</mets:name>
</mets:agent>
</mets:metsHdr>
<mets:dmdSec ID="DMDLOG_0001">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:amdSec ID="AMD">
</mets:amdSec>
<mets:fileSec>
</mets:fileSec>
<mets:structMap TYPE="PHYSICAL">
<mets:div TYPE="physSequence">
</mets:div>
</mets:structMap>
</mets:mets>
root@serv50:/home/jb/test# docker-ocrd md5sum /usr/local/bin/ocrd-import
817a92bcfdbc45c020b47534f2074895 /usr/local/bin/ocrd-import
mkdir OCR-D-IMG
cp ../wm516/OCR-D-IMG/1.tif OCR-D-IMG
ocrd workspace init
What is that good for?
ocrd-import OCR-D-IMG/
Note: that creates a workspace under OCR-D-IMG, not in the CWD!
cat mets.xml
You do know that's the file created above from workspace init
, right?
The result of ocrd-import is now in OCR-D-IMG/mets.xml
In short, I don't understand what you are trying to accomplish. See ocrd-import -h
for usage.
Yes... just seen... "If nothing else helps, read the manual":
finally write everything to path/to/your/images/mets.xml