cisocrgroup/ocrd_cis

'GeometryCollection' object has no attribute 'exterior'

Closed this issue · 8 comments

Environment

  • Version: included in Docker Image ocrd/all:maximum from 2020-09-10 (docker image id: 9e71ab5d7d53)

Current behavior

When executing docker run -v 1085:/data -w /data -v calamari_models:/models -- ocrd/all:maximum ocrd process ⟨here omitting the “best results for selected pages” workflow⟩, I receive the following error:

Traceback (most recent call last):
  File "/usr/bin/ocrd", line 33, in <module>
    sys.exit(load_entry_point('ocrd', 'console_scripts', 'ocrd')())
  File "/usr/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/build/core/ocrd/ocrd/cli/process.py", line 28, in process_cli
    run_tasks(mets, log_level, page_id, tasks, overwrite)
  File "/build/core/ocrd/ocrd/task_sequence.py", line 149, in run_tasks
    raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % (task.executable, returncode, out, err))
Exception: ocrd-cis-ocropy-segment exited with non-zero return value 1. STDOUT:

STDERR:
Traceback (most recent call last):
  File "/usr/bin/ocrd-cis-ocropy-segment", line 33, in <module>
    sys.exit(load_entry_point('ocrd-cis', 'console_scripts', 'ocrd-cis-ocropy-segment')())
  File "/usr/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/build/ocrd_cis/ocrd_cis/ocropy/cli.py", line 53, in ocrd_cis_ocropy_segment
    return ocrd_cli_wrap_processor(OcropySegment, *args, **kwargs)
  File "/build/core/ocrd/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/build/core/ocrd/ocrd/processor/helpers.py", line 69, in run_processor
    processor.process()
  File "/build/ocrd_cis/ocrd_cis/ocropy/segment.py", line 381, in process
    region.id, file_id + '_' + region.id, zoom)
  File "/build/ocrd_cis/ocrd_cis/ocropy/segment.py", line 648, in _process_element
    line_polygon = polygon_for_parent(line_polygon, element)
  File "/build/ocrd_cis/ocrd_cis/ocropy/segment.py", line 677, in polygon_for_parent
    return interp.exterior.coords[:-1] # keep open
AttributeError: 'GeometryCollection' object has no attribute 'exterior'

I uploaded the contents of the directory 1085 here (removed now) to make reproduction easier. See also an example input image.

I patched the failing line in segment.py as follows:

677     try:
678         return interp.exterior.coords[:-1] # keep open
679     except AttributeError as e:
680         print(interp)
681         raise e

I copied the patch to the docker container and commited the docker container to a testing docker image. Then, instead of running the whole workflow, I executed just the failing step:

$ docker run -v 1085:/data -w /data -- ocrd_test ocrd process \
> "cis-ocropy-segment -I OCR-D-SEG-REG-DESKEW-CLIP -O OCR-D-SEG-LINE -P level-of-operation region"

I attach the output below:

GEOMETRYCOLLECTION (LINESTRING (1902 2230, 1910 2230), POLYGON ((1910 2157, 1910 2229, 1910 2230, 1923 2230, 1926 2223, 1990 2220, 2077 2223, 2083 2220, 2098 2220, 2107 2224, 2189 2224, 2221 2220, 2311 2221, 2344 2217, 2461 2223, 2643 2220, 2650 2218, 2655 2212, 2654 2173, 2648 2168, 2629 2166, 2390 2164, 2378 2167, 2376 2177, 2333 2177, 2331 2170, 2323 2166, 2292 2164, 2123 2164, 2097 2171, 2057 2171, 2047 2164, 2035 2164, 2027 2171, 2018 2172, 1980 2163, 1922 2162, 1910 2157)))

I assume the LINESTRING is what is causing the issues.

I patched segment.py as follows:

675     if interp.type.startswith('Multi') or interp.type == 'GeometryCollection':
676         interp = interp.convex_hull

This seems to have fixed this issue. Would you like me to open a pull request, or is there an easy way to investigate which one of the 95 files in directory 1085/OCR-D-SEG-REG-DESKEW-CLIP causes the issue, so that we can see where the LineString is coming from?

Dear @Witiko, thanks for the detailed report and the suggestions! This topic was hot across multiple similarly implemented modules in the recent days. There is a full-scale discussion at OCR-D/ocrd_segment#43, and comprehensive fixes have been proposed (OCR-D/ocrd_tesserocr#152, #61) or merged (https://github.com/OCR-D/ocrd_segment/) already. These will be integrated into a new ocrd_all release shortly.

If you'd like to test these yourself already in a Docker installation, you can follow these steps:

docker run -it ocrd/all:maximum
cd /build/ocrd_segment && git pull && pip install -e .
cd /build/ocrd_tesserocr && git pull origin pull/152/head && pip install -e .
cd /build/ocrd_cis && git pull origin pull/61/head && pip install -e .
# now to make that change permanent in your image, wait here,
# and in *another shell* (outside your running container) do:
docker container commit $(docker container ls --filter status=running --format "{{.ID}}") ocrd/all:maximum

Thanks, @bertsky, I am trying that out now. I will let you know if things are fixed.

I am trying that out now.

Or, if you just wait another hour (or so), the new ocrd/all image will become available.

@bertsky Thank you, I will wait. After the suggested changes, I am receiving the following error:

pkg_resources.DistributionNotFound: The 'ocrd>=2.16.2' distribution was not found and is required by ocrd-segment

After the suggested changes, I am receiving the following error:

pkg_resources.DistributionNotFound: The 'ocrd>=2.16.2' distribution was not found and is required by ocrd-segment

Must have been a temporary impasse. I can fetch 2.16.3 from PyPI now.

In the current ocrd/all:maximum, ocrd --version is also 2.16.3.

@bertsky I am experiencing no issues with the yesterday's docker image. Thanks again.