cisocrgroup/ocrd_cis

ocrd-cis-align --dump-json does not produce valid JSON

Closed this issue · 3 comments

Calling ocrd-cis-align --dump-json with the Docker image ocrd/all:2020-12-28 gives the following standard output (notice the last three lines):

{
 "executable": "ocrd-cis-align",
 "categories": [
  "Text recognition and optimization"
 ],
 "steps": [
  "recognition/post-correction"
 ],
 "input_file_grp": [
  "OCR-D-OCR-1",
  "OCR-D-OCR-2",
  "OCR-D-OCR-N"
 ],
 "output_file_grp": [
  "OCR-D-ALIGNED"
 ],
 "description": "Align multiple OCRs and/or GTs"
}
11:13:38.440 CRITICAL root - getLogger was called before initLogging. Source of the call:
11:13:38.441 CRITICAL root -   File "/build/ocrd_cis/ocrd_cis/align/cli.py", line 35, in __init__
11:13:38.441 CRITICAL root -     self.log = getLogger('cis.Processor.Aligner')

This crashes OCR-D when calling ocrd process "cis-align …" …:

Traceback (most recent call last):
  File "/usr/bin/ocrd", line 33, in <module>
    sys.exit(load_entry_point('ocrd', 'console_scripts', 'ocrd')())
  …
  File "/build/core/ocrd/ocrd/task_sequence.py", line 72, in validate
    param_validator = ParameterValidator(self.ocrd_tool_json)
  File "/build/core/ocrd/ocrd/task_sequence.py", line 53, in ocrd_tool_json
    self._ocrd_tool_json = json.loads(result.stdout)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 19 column 1 (char 312)

Thanks @Witiko – this slipped through. If you pull from #77 this should work now. (Don't know yet when I can finish this off though.)

@bertsky No problem. I will likely not be able to test this until there is a Docker image. For the moment, I am using ocrd-cis-align instead of ocrd process "cis-align …" ….

I will likely not be able to test this until there is a Docker image.

You can pull from git repos even in the Docker images. In this case (where the Docker image already contains the PR branch #77, just not the current head):

docker run -it ocrd/all bash
cd /build
git -C ocrd_cis pull origin pull/77/head

That's it! (No need to re-install via pip, because modules are installed in editable mode now, and the recent changes did not affect anything other than source files.)

You can also make these changes permanent (to your local image) by using docker commit ...

For the moment, I am using ocrd-cis-align instead of ocrd process "cis-align …" ….

Yes, that would work, but I also added a fix that makes ocrd-cis-align produce valid PAGE-XML again. (Without it, you won't be able to open output files down the pipeline with PageViewer.)