qurator-spk/eynollah

contour extraction: inhomogeneous shape

Closed this issue · 3 comments

Running on a longer set of images, eynollah stumbles over:

Traceback (most recent call last):
  File "/local/ocr-d/ocrd_all/venv/bin/ocrd-eynollah-segment", line 8, in <module>
    sys.exit(main())
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/ocrd_cli.py", line 8, in main
    return ocrd_cli_wrap_processor(EynollahProcessor, *args, **kwargs)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/ocrd/decorators/__init__.py", line 117, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/ocrd/processor/helpers.py", line 107, in run_processor
    processor.process()
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/processor.py", line 58, in process
    Eynollah(**eynollah_kwargs).run()
  File "/local/ocr-d/ocrd_all/venv/lib/python3.8/site-packages/qurator/eynollah/eynollah.py", line 2446, in run
    contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (17,) + inhomogeneous part.

This is eynollah a6fe781 / Python 3.8 / TF 2.10 / Numpy 1.24.2 / Shapely 2.0.1.

I'll try to figure out some more about the particular input image.

Simple reason is that Numpy now does not allow this implicit casting anymore. This is what it used to say:

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

Obviously, adding dtype=object in all these cases fixes the problem.

Is it ok for me to include the fix in #91?

Hello, still same issue here:

Traceback (most recent call last):
  File "/home/sapo/develop/AutoDocAugment/.venv/bin/eynollah", line 8, in <module>
    sys.exit(main())
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/cli.py", line 193, in main
    eynollah.run()
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/eynollah.py", line 2904, in run
    contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (7,) + inhomogeneous part.

but sometimes the error is generated in another (identical) line:

  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/cli.py", line 193, in main
    eynollah.run()
  File "/home/sapo/develop/AutoDocAugment/.venv/lib/python3.10/site-packages/qurator/eynollah/eynollah.py", line 2982, in run
    contours_only_text_parent = list(np.array(contours_only_text_parent)[index_con_parents])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.

My environment:

dependencies = [
    "scikit-image[optional]>=0.20.0",
    "requests>=2.31.0",
    "beautifulsoup4>=4.12.2",
    "rich>=13.3.5",
    "toml>=0.10.2",
    "latex @ git+https://github.com/gvasold/latex.git",
    "opencv-python>=4.7.0.72",
    "jinja2>=3.1.2",
    "pymupdf>=1.22.3",
    "augraphy>=8.2.3",
    "requests-cache>=1.0.1",
    "lxml>=4.9.2",
    "numpy>=1.23.5",
    "pytesseract>=0.3.10",
    "tensorflow>=2.4,<2.12", # constraint due to eynollah
    "eynollah>=0.3.0",
]
requires-python = ">=3.10,<3.11"  # constraint due to eynollah

Adding dtype=object as in this solved the issue for me.

Dear @00sapo ,

As you pointed out, this issue had previously been resolved in commit a56988a , but for some reason, it seems to have been overlooked in the most recent version. I have reapplied the commit to address it once more. Thank you