Cannot run
Closed this issue · 2 comments
Hi I am trying to run this library but no luck till now.
I've followed the readme and I got deps properly installed.
I am trying to run the model on a image but I get an error:
ocrd-detectron2-segment -I imgpath -O outpath -P categories '["TableRegion"]' -P model_config TableBank_X152.yaml -P model_weights TableBank_X152.pth -P min_confidence 0.1
Traceback (most recent call last):
File "/Users/maurosciancalepore/miniconda3/envs/detectron/bin/ocrd-detectron2-segment", line 8, in <module>
sys.exit(ocrd_detectron2_segment())
File "/Users/maurosciancalepore/miniconda3/envs/detectron/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/maurosciancalepore/miniconda3/envs/detectron/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/maurosciancalepore/miniconda3/envs/detectron/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/maurosciancalepore/miniconda3/envs/detectron/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/maurosciancalepore/miniconda3/envs/detectron/lib/python3.9/site-packages/ocrd_detectron2/cli.py", line 9, in ocrd_detectron2_segment
return ocrd_cli_wrap_processor(Detectron2Segment, *args, **kwargs)
File "/Users/maurosciancalepore/miniconda3/envs/detectron/lib/python3.9/site-packages/ocrd/decorators/__init__.py", line 68, in ocrd_cli_wrap_processor
workspace = resolver.workspace_from_url(mets, working_dir)
File "/Users/maurosciancalepore/miniconda3/envs/detectron/lib/python3.9/site-packages/ocrd/resolver.py", line 166, in workspace_from_url
self.download_to_directory(dst_dir, mets_url, basename=mets_basename, if_exists='overwrite' if clobber_mets else 'skip')
File "/Users/maurosciancalepore/miniconda3/envs/detectron/lib/python3.9/site-packages/ocrd/resolver.py", line 82, in download_to_directory
raise FileNotFoundError("File path passed as 'url' to download_to_directory does not exist: %s" % url)
FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: /Users/maurosciancalepore/my_project/saturday/ocrd_detectron2/mets.xml
Can you give me some guidance? An example of actual, working usage would be appreciated too. nice job btw
Hi @masc-it!
I get a feeling you expect imgpath
and outpath
to be a normal path name (perhaps even file name). But OCR-D uses METS-XML as container to manage documents (comprised of many files referenced as local paths or URLs, but organised in fileGrps). Hence this wrapper. The Readme contains links to what METS is and what an OCR-D processor must do.
If you just want to process some images, you can install OCR-D and do ocrd-import path/to/directory
to get a fresh METS, which can then be used by ocrd-detectron2-segment. Beware though that due to the postprocessing, this tool also requires to run a binarization processor prior.
I'll publish a full CI working example shortly.
See make test
in https://github.com/bertsky/ocrd_detectron2/blob/master/Makefile or https://github.com/bertsky/ocrd_detectron2/blob/master/.github/workflows/python-app.yml (test results downloadable as artifact).
This runs a complete command line example on
- 4 Detectron2 models
- 2 OCR-D workspaces
- Python 3.7, 3.8 and 3.9
In there, I used ocrd-skimage-binarize for binarization – not because it's the best or fastest method, but because it is pure Python and needs no extra model downloads.
Perhaps for those who don't want OCR-D interfaces, but do like to have a single tool for multiple models, and perhaps even my postprocessing, I can write a standalone API and CLI that does not depend on METS-XML and PAGE-XML.