OCR-D/ocrd_all

ocrd-import does not work

Closed this issue · 4 comments

root@serv50:/home/jb/wm516# docker-ocrd ocrd-import img
09:52:27.585 INFO ocrd.ocrd-import - analysing 'img'
09:52:28.219 INFO ocrd.resolver.workspace_from_nothing - Writing METS to /data/
►img/mets.xml
09:52:28.343 INFO ocrd.ocrd-import - adding -g p0001 -G OCR-D-IMG -m image/tiff
► -i OCR-D-IMG_f1 '1.tif'
09:52:28.962 INFO ocrd.cli.workspace.bulk-add - [   1/1] OCR-D-IMG image/tiff p
►0001 OCR-D-IMG_f1 1.tif
09:52:29.080 INFO ocrd.ocrd-import - Success on 'img'

root@serv50:/home/jb/wm516# docker-ocrd ocrd-anybaseocr-dewarp -I OCR-D-IMG -O
► OCR-D-001 -P model_path latest_net_G.pth
Traceback (most recent call last):
  File "/usr/local/bin/ocrd-anybaseocr-dewarp", line 33, in <module>
    sys.exit(load_entry_point('ocrd-anybaseocr', 'console_scripts', 'ocrd-
►anybaseocr-dewarp')())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in __
►call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in
► main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in
► invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in
► invoke
    return __callback(*args, **kwargs)
  File "/build/ocrd_anybaseocr/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py",
► line 212, in cli
    return ocrd_cli_wrap_processor(OcrdAnybaseocrDewarper, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/ocrd/decorators/__init__.py",
► line 109, in ocrd_cli_wrap_processor
    raise Exception("Invalid input/output file grps:\n\t%s" % '\n\t'.join(report
►.errors))
Exception: Invalid input/output file grps:
→Input fileGrp[@USE='OCR-D-IMG'] not in METS!
root@serv50:/home/jb/wm516# cat mets.xml
<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/
►1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:
►schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/
►premis-v2-0.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/
►mods-3-6.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd
► http://www.loc.gov/mix/v10 http://www.loc.gov/standards/mix/mix10/mix10.xsd">
  <mets:metsHdr CREATEDATE="2024-03-20T09:51:19.438874">
    <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="CREATOR">
      <mets:name>ocrd/core v2.63.3</mets:name>
    </mets:agent>
  </mets:metsHdr>
  <mets:dmdSec ID="DMDLOG_0001">
    <mets:mdWrap MDTYPE="MODS">
      <mets:xmlData>
        <mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
                </mods:mods>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  <mets:amdSec ID="AMD">
    </mets:amdSec>
  <mets:fileSec>
    </mets:fileSec>
  <mets:structMap TYPE="PHYSICAL">
    <mets:div TYPE="physSequence">
        </mets:div>
  </mets:structMap>
</mets:mets>

That's odd. Just to be sure: you did use the same workspace directory img in both cases (just did not report it that way), right?

Next question would be: which version are you using? (To be safe: does md5sum /usr/local/bin/ocrd-import produce 817a92bcfdbc45c020b47534f2074895, the one from current master?)

Next best thing is diagnostics: could you try to change your ~/ocrd_logging.conf (or if it does not exist, in the Docker image modify /etc/ocrd_logging.conf such that either the root logger has level=DEBUG, or there exists a logger for qualname=ocrd with propagate=1 and level=DEBUG) and try again (i.e. rm the mets.xml and rerun ocrd-import).

Question 1 (did procedure again, no editing) & 2:

root@serv50:/home/jb/test# mkdir OCR-D-IMG
root@serv50:/home/jb/test# cp ../wm516/OCR-D-IMG/1.tif OCR-D-IMG
root@serv50:/home/jb/test# docker-ocrd ocrd workspace init
11:28:03.566 INFO ocrd.resolver.workspace_from_nothing - Writing METS to /data/
►mets.xml
/data
root@serv50:/home/jb/test# docker-ocrd ocrd-import OCR-D-IMG/
11:28:17.896 INFO ocrd.ocrd-import - analysing 'OCR-D-IMG/'
11:28:18.530 INFO ocrd.resolver.workspace_from_nothing - Writing METS to /data/
►OCR-D-IMG/mets.xml
11:28:18.653 INFO ocrd.ocrd-import - adding -g p0001 -G OCR-D-IMG -m image/tiff
► -i OCR-D-IMG_f1 '1.tif'
11:28:19.279 INFO ocrd.cli.workspace.bulk-add - [   1/1] OCR-D-IMG image/tiff p
►0001 OCR-D-IMG_f1 1.tif
11:28:19.395 INFO ocrd.ocrd-import - Success on 'OCR-D-IMG/'
root@serv50:/home/jb/test# cat mets.xml
<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/
►1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:
►schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/
►premis-v2-0.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/
►mods-3-6.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd
► http://www.loc.gov/mix/v10 http://www.loc.gov/standards/mix/mix10/mix10.xsd">
  <mets:metsHdr CREATEDATE="2024-03-20T11:28:03.566113">
    <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="CREATOR">
      <mets:name>ocrd/core v2.63.3</mets:name>
    </mets:agent>
  </mets:metsHdr>
  <mets:dmdSec ID="DMDLOG_0001">
    <mets:mdWrap MDTYPE="MODS">
      <mets:xmlData>
        <mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
                </mods:mods>
      </mets:xmlData>
    </mets:mdWrap>
  </mets:dmdSec>
  <mets:amdSec ID="AMD">
    </mets:amdSec>
  <mets:fileSec>
    </mets:fileSec>
  <mets:structMap TYPE="PHYSICAL">
    <mets:div TYPE="physSequence">
        </mets:div>
  </mets:structMap>
</mets:mets>
root@serv50:/home/jb/test# docker-ocrd md5sum /usr/local/bin/ocrd-import
817a92bcfdbc45c020b47534f2074895  /usr/local/bin/ocrd-import

mkdir OCR-D-IMG
cp ../wm516/OCR-D-IMG/1.tif OCR-D-IMG
ocrd workspace init

What is that good for?

ocrd-import OCR-D-IMG/

Note: that creates a workspace under OCR-D-IMG, not in the CWD!

cat mets.xml

You do know that's the file created above from workspace init, right?

The result of ocrd-import is now in OCR-D-IMG/mets.xml

In short, I don't understand what you are trying to accomplish. See ocrd-import -h for usage.

Yes... just seen... "If nothing else helps, read the manual":

finally write everything to path/to/your/images/mets.xml