contour extraction: inhomogeneous shape
Closed this issue · 3 comments
bertsky commented
Running on a longer set of images, eynollah stumbles over:
Traceback (most recent call last):
File "/local/ocr-d/ocrd_all/venv/bin/ocrd-eynollah-segment", line 8, in <module>
sys.exit(main())
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/ocrd_cli.py", line 8, in main
return ocrd_cli_wrap_processor(EynollahProcessor, *args, **kwargs)
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/ocrd/decorators/__init__.py", line 117, in ocrd_cli_wrap_processor
run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/ocrd/processor/helpers.py", line 107, in run_processor
processor.process()
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/processor.py", line 58, in process
Eynollah(**eynollah_kwargs).run()
File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/eynollah.py", line 2446, in run
contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (17,) + inhomogeneous part.
This is eynollah a6fe781 / Python 3.8 / TF 2.10 / Numpy 1.24.2 / Shapely 2.0.1.
I'll try to figure out some more about the particular input image.
bertsky commented
Simple reason is that Numpy now does not allow this implicit casting anymore. This is what it used to say:
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
Obviously, adding dtype=object in all these cases fixes the problem.
Is it ok for me to include the fix in #91?
00sapo commented
Hello, still same issue here:
Traceback (most recent call last):
File "/home/sapo/develop/AutoDocAugment/.venv/bin/eynollah", line 8, in <module>
sys.exit(main())
File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/cli.py", line 193, in main
eynollah.run()
File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/eynollah.py", line 2904, in run
contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (7,) + inhomogeneous part.
but sometimes the error is generated in another (identical) line:
File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/cli.py", line 193, in main
eynollah.run()
File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/eynollah.py", line 2982, in run
contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.
My environment:
dependencies = [
"scikit-image[optional]>=0.20.0",
"requests>=2.31.0",
"beautifulsoup4>=4.12.2",
"rich>=13.3.5",
"toml>=0.10.2",
"latex @ git+https://github.com/gvasold/latex.git",
"opencv-python>=4.7.0.72",
"jinja2>=3.1.2",
"pymupdf>=1.22.3",
"augraphy>=8.2.3",
"requests-cache>=1.0.1",
"lxml>=4.9.2",
"numpy>=1.23.5",
"pytesseract>=0.3.10",
"tensorflow>=2.4,<2.12", # constraint due to eynollah
"eynollah>=0.3.0",
]
requires-python = ">=3.10,<3.11" # constraint due to eynollah
Adding dtype=object
as in this solved the issue for me.
vahidrezanezhad commented