'GeometryCollection' object has no attribute 'exterior'
Closed this issue · 8 comments
Environment
- Version: included in Docker Image
ocrd/all:maximum
from 2020-09-10 (docker image id: 9e71ab5d7d53)
Current behavior
When executing docker run -v 1085:/data -w /data -v calamari_models:/models -- ocrd/all:maximum ocrd process ⟨here omitting the “best results for selected pages” workflow⟩, I receive the following error:
Traceback (most recent call last):
File "/usr/bin/ocrd", line 33, in <module>
sys.exit(load_entry_point('ocrd', 'console_scripts', 'ocrd')())
File "/usr/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/build/core/ocrd/ocrd/cli/process.py", line 28, in process_cli
run_tasks(mets, log_level, page_id, tasks, overwrite)
File "/build/core/ocrd/ocrd/task_sequence.py", line 149, in run_tasks
raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % (task.executable, returncode, out, err))
Exception: ocrd-cis-ocropy-segment exited with non-zero return value 1. STDOUT:
STDERR:
Traceback (most recent call last):
File "/usr/bin/ocrd-cis-ocropy-segment", line 33, in <module>
sys.exit(load_entry_point('ocrd-cis', 'console_scripts', 'ocrd-cis-ocropy-segment')())
File "/usr/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/build/ocrd_cis/ocrd_cis/ocropy/cli.py", line 53, in ocrd_cis_ocropy_segment
return ocrd_cli_wrap_processor(OcropySegment, *args, **kwargs)
File "/build/core/ocrd/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
File "/build/core/ocrd/ocrd/processor/helpers.py", line 69, in run_processor
processor.process()
File "/build/ocrd_cis/ocrd_cis/ocropy/segment.py", line 381, in process
region.id, file_id + '_' + region.id, zoom)
File "/build/ocrd_cis/ocrd_cis/ocropy/segment.py", line 648, in _process_element
line_polygon = polygon_for_parent(line_polygon, element)
File "/build/ocrd_cis/ocrd_cis/ocropy/segment.py", line 677, in polygon_for_parent
return interp.exterior.coords[:-1] # keep open
AttributeError: 'GeometryCollection' object has no attribute 'exterior'
I uploaded the contents of the directory 1085 here (removed now) to make reproduction easier. See also an example input image.
I patched the failing line in segment.py
as follows:
677 try:
678 return interp.exterior.coords[:-1] # keep open
679 except AttributeError as e:
680 print(interp)
681 raise e
I copied the patch to the docker container and commited the docker container to a testing docker image. Then, instead of running the whole workflow, I executed just the failing step:
$ docker run -v 1085:/data -w /data -- ocrd_test ocrd process \
> "cis-ocropy-segment -I OCR-D-SEG-REG-DESKEW-CLIP -O OCR-D-SEG-LINE -P level-of-operation region"
I attach the output below:
GEOMETRYCOLLECTION (LINESTRING (1902 2230, 1910 2230), POLYGON ((1910 2157, 1910 2229, 1910 2230, 1923 2230, 1926 2223, 1990 2220, 2077 2223, 2083 2220, 2098 2220, 2107 2224, 2189 2224, 2221 2220, 2311 2221, 2344 2217, 2461 2223, 2643 2220, 2650 2218, 2655 2212, 2654 2173, 2648 2168, 2629 2166, 2390 2164, 2378 2167, 2376 2177, 2333 2177, 2331 2170, 2323 2166, 2292 2164, 2123 2164, 2097 2171, 2057 2171, 2047 2164, 2035 2164, 2027 2171, 2018 2172, 1980 2163, 1922 2162, 1910 2157)))
I assume the LINESTRING is what is causing the issues.
I patched segment.py
as follows:
675 if interp.type.startswith('Multi') or interp.type == 'GeometryCollection':
676 interp = interp.convex_hull
This seems to have fixed this issue. Would you like me to open a pull request, or is there an easy way to investigate which one of the 95 files in directory 1085/OCR-D-SEG-REG-DESKEW-CLIP
causes the issue, so that we can see where the LineString is coming from?
Dear @Witiko, thanks for the detailed report and the suggestions! This topic was hot across multiple similarly implemented modules in the recent days. There is a full-scale discussion at OCR-D/ocrd_segment#43, and comprehensive fixes have been proposed (OCR-D/ocrd_tesserocr#152, #61) or merged (https://github.com/OCR-D/ocrd_segment/) already. These will be integrated into a new ocrd_all release shortly.
If you'd like to test these yourself already in a Docker installation, you can follow these steps:
docker run -it ocrd/all:maximum
cd /build/ocrd_segment && git pull && pip install -e .
cd /build/ocrd_tesserocr && git pull origin pull/152/head && pip install -e .
cd /build/ocrd_cis && git pull origin pull/61/head && pip install -e .
# now to make that change permanent in your image, wait here,
# and in *another shell* (outside your running container) do:
docker container commit $(docker container ls --filter status=running --format "{{.ID}}") ocrd/all:maximum
Thanks, @bertsky, I am trying that out now. I will let you know if things are fixed.
I am trying that out now.
Or, if you just wait another hour (or so), the new ocrd/all image will become available.
@bertsky Thank you, I will wait. After the suggested changes, I am receiving the following error:
pkg_resources.DistributionNotFound: The 'ocrd>=2.16.2' distribution was not found and is required by ocrd-segment
After the suggested changes, I am receiving the following error:
pkg_resources.DistributionNotFound: The 'ocrd>=2.16.2' distribution was not found and is required by ocrd-segment
Must have been a temporary impasse. I can fetch 2.16.3 from PyPI now.
In the current ocrd/all:maximum, ocrd --version
is also 2.16.3
.