selection via OCRD_MODULES fails due to dependent modules
joschrew opened this issue · 4 comments
I tried to install ocrd_all with the setup-guide: https://ocr-d.de/en/setup:
make all OCRD_MODULES="core ocrd_tesserocr ocrd_cis"
and got following error running the mentioned command:
Successfully installed numpy-1.24.3 shapely-2.0.1
make[1]: Leaving directory '/home/cloud/repos/ocrd_all/core'
. /home/cloud/repos/ocrd_all/venv/bin/activate && cd tesserocr && sem -q --will-cite --fg --id ocrd_all_pipvenv pip insta
ll .
ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
make: *** [Makefile:671: /home/cloud/repos/ocrd_all/venv/share/tesserocr] Error 1
Seems the problem is that ocrd_tesserocr needs tesserocr to be installed. The installation of tesserocr is triggered but the sources are not fetched and that leads to the error. Running make all OCRD_MODULES="core ocrd_tesserocr ocrd_cis tesseract tesserocr"
resolves the problem but maybe it would be good to change the makefile to be able to fetch the needed sources itself.
Except for core (which is needed everywhere and can be seen as a special case) and tesseract/tesserocr, we have no dependencies between modules. In the latter case, we have some special rules which should switch over to the PPA installation of Tesseract if it is not selected as a module. So AFAICS the actual issue here is more specific: for some reason, these rules for tesserocr do not work anymore.
Looking closer, we have 3 places where we try to turn some pip dependency into an explicit target via $(SHARE)/%
:
- numpy → do we still need this? It gets pulled by core already, and many modules depend on it – all via pip. IIRC the initial reason was that OpenCV build used to have it as a build-time dependency
- opencv-python → only need on some platforms without prebuilt versions on PyPI; installation via source module (custom rules) is enforced by making core depend on it explicitly (so it gets built before anything else)
- tesserocr → there used to be a time when no usable prebuilt version were available on PyPI (that would interoperate with recent Tesseract), and this may happen again in the future; for that we included it as a submodule; but when you do not activate it (via
git submodule init
or OCRD_MODULES), this should just install via pip. Apparently, the latter is not the case (anymore?).