Got exception using ocrd_detectron 2 with ocrd_all Release v2022-11-10
Closed this issue · 18 comments
Hi,
i have installed ocrd_all v2022-11-10 (which has core version 2.41.0) on Ubuntu 22.04.
I simply have tried out ocrd-detectron2-segment --help
And I get the following exception:
(ocrd-3.7) ocrdadmin@ocrd-03:~$ ocrd-detectron2-segment --help
Traceback (most recent call last):
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/bin/ocrd-detectron2-segment", line 5, in <module>
from ocrd_detectron2.cli import ocrd_detectron2_segment
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_detectron2/cli.py", line 4, in <module>
from .segment import Detectron2Segment
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_detectron2/segment.py", line 18, in <module>
from detectron2.engine import DefaultPredictor
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/engine/__init__.py", line 11, in <module>
from .hooks import *
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/engine/hooks.py", line 22, in <module>
from detectron2.evaluation.testing import flatten_results_dict
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/evaluation/__init__.py", line 2, in <module>
from .cityscapes_evaluation import CityscapesInstanceEvaluator, CityscapesSemSegEvaluator
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/evaluation/cityscapes_evaluation.py", line 11, in <module>
from detectron2.data import MetadataCatalog
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/data/__init__.py", line 4, in <module>
from .build import (
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/data/build.py", line 13, in <module>
from detectron2.structures import BoxMode
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/structures/__init__.py", line 3, in <module>
from .image_list import ImageList
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/structures/image_list.py", line 8, in <module>
from detectron2.layers.wrappers import shapes_to_tensor
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/layers/__init__.py", line 3, in <module>
from .deform_conv import DeformConv, ModulatedDeformConv
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/layers/deform_conv.py", line 11, in <module>
from detectron2 import _C
ImportError: /home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE
--> please clarify.
I assume this is a native installation (make all
or make all -j
)?
Do you still have the build log? (It would help seeing how ocrd_detectron2 and Pytorch got installed.)
Or can you try…
make -W ocrd_detectron2 ocrd-detectron2-segment
…and show me the output?
Or equivalently:
. /home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/bin/activate
make -C /path/to/ocrd_detectron2 deps
Output from make -C ocrd_detectron2/ deps
:
detectron2-deps.txt
And yes, it is native
Output from
make -C ocrd_detectron2/ deps
:
detectron2-deps.txt
Hard to tell from that snippet. It says all packages are already installed. I'd need the log from that first run. (Is this a fresh install, or might the venv be from an older version?)
But essentially, it should suffice to redo:
pip uninstall torch torchvision
make -C ocrd_detectron2 deps
- I have made an update with newest ocrd_all (Rel from 2022-11-24), which includes core Rel 2.42.0
(the update I have made with
git pull
(inocrd_all
directory)- creating a new venv
make all
--> Trying out afterwardsmake -C ocrd_detectron2 deps
still produces same exception as above.
- I have followed you instructions above:
(ocrd-3.7) ocrdadmin@ocrd-03:~$ pip uninstall torch torchvision
Found existing installation: torch 1.11.0
Uninstalling torch-1.11.0:
Would remove:
/home/ocrdadmin/ocrd-3.7/bin/convert-caffe2-to-onnx
/home/ocrdadmin/ocrd-3.7/bin/convert-onnx-to-caffe2
/home/ocrdadmin/ocrd-3.7/bin/torchrun
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/caffe2/*
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torch-1.11.0.dist-info/*
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torch/*
Proceed (Y/n)? Y
Successfully uninstalled torch-1.11.0
Found existing installation: torchvision 0.12.0
Uninstalling torchvision-0.12.0:
Would remove:
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torchvision-0.12.0.dist-info/*
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torchvision.libs/libcudart.d9bbffd7.so.10.2
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torchvision.libs/libjpeg.ceea7512.so.62
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torchvision.libs/libnvjpeg.23816019.so.10
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torchvision.libs/libpng16.7f72a3c5.so.16
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torchvision.libs/libz.1328edc3.so.1
/home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages/torchvision/*
Proceed (Y/n)? Y
Successfully uninstalled torchvision-0.12.0
and
make -C ocrd_detectron2 deps
has resulted in this error:
Requirement already satisfied: charset-normalizer<3,>=2 in /home/ocrdadmin/ocrd-3.7/lib/python3.7/site-packages (from requests->torchvision>=0.11.2->-r /dev/fd/63 (line 2)) (2.1.1)
Installing collected packages: torch, torchvision
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
kraken 4.2.0 requires torch<=1.11,>=1.7.1, but you have torch 1.13.0+cpu which is incompatible.
kraken 4.2.0 requires torch<=1.11
oh, but now you were in the top-level venv again. I made that suggestion in the context where you had the sub-venv activated.
Thus:
. /home/ocrdadmin/ocrd-3.7/bin/activate
pip uninstall torch torchvision
make -C ocrd_kraken ocrd-kraken-segment
. /home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/bin/activate
pip uninstall torch torchvision
make -C /path/to/ocrd_detectron2 deps
Hmm, for make -C ocrd_kraken ocrd-kraken-segment
I get
(ocrd-3.7) ocrdadmin@ocrd-03:~/ocrd_all$ make -C ocrd_kraken ocrd-kraken-segment
make: Entering directory '/home/ocrdadmin/ocrd_all/ocrd_kraken'
make: *** No rule to make target 'ocrd-kraken-segment'. Stop.
make: Leaving directory '/home/ocrdadmin/ocrd_all/ocrd_kraken'
I'm sorry, that should have read: make -W ocrd_kraken ocrd-kraken-segment
I'm sorry, that should have read:
make -W ocrd_kraken ocrd-kraken-segment
--> ok, this has worked
But, now - after
pip uninstall torch torchvision
make -C /path/to/ocrd_detectron2 deps
I get:
(headless-tf1) ocrdadmin@ocrd-03:~/ocrd_all$ ocrd-detectron2-segment --help
Traceback (most recent call last):
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/bin/ocrd-detectron2-segment", line 5, in <module>
from ocrd_detectron2.cli import ocrd_detectron2_segment
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_detectron2/cli.py", line 4, in <module>
from .segment import Detectron2Segment
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/ocrd_detectron2/segment.py", line 18, in <module>
from detectron2.engine import DefaultPredictor
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/engine/__init__.py", line 11, in <module>
from .hooks import *
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/engine/hooks.py", line 22, in <module>
from detectron2.evaluation.testing import flatten_results_dict
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/evaluation/__init__.py", line 2, in <module>
from .cityscapes_evaluation import CityscapesInstanceEvaluator, CityscapesSemSegEvaluator
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/evaluation/cityscapes_evaluation.py", line 11, in <module>
from detectron2.data import MetadataCatalog
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/data/__init__.py", line 4, in <module>
from .build import (
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/data/build.py", line 13, in <module>
from detectron2.structures import BoxMode
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/structures/__init__.py", line 3, in <module>
from .image_list import ImageList
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/structures/image_list.py", line 8, in <module>
from detectron2.layers.wrappers import shapes_to_tensor
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/layers/__init__.py", line 3, in <module>
from .deform_conv import DeformConv, ModulatedDeformConv
File "/home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/layers/deform_conv.py", line 11, in <module>
from detectron2 import _C
ImportError: /home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/lib/python3.7/site-packages/detectron2/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE
Ok, but what did make -C /path/to/ocrd_detectron2 deps
itself say? (I still need to know how you got the wrong version of Pytorch / Detectron.)
see here:
make-result.txt
Thank you. Pytorch itself seems to be the correct version now, but Detectron2 should also be reinstalled. And I still don't know why detectron2 was built against the wrong Pytorch in the first place...
Please try:
. /home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/bin/activate
pip uninstall detectron2
make -C /path/to/ocrd_detectron2 deps
Another
make_result.txt
and still got exception with (headless-tf1) ocrdadmin@ocrd-03:~/ocrd_all$ ocrd-detectron2-segment --help
Thanks again. I can now reproduce. The detectron2 wheels somehow are not compatible with current Pytorch releases. Using the older torch==1.10.* works, as does recompiling detectron2.
I wonder how this could have slipped past me. I've done so many tests last time. Plus the ocrd_all build includes some checks of its own...
Ah, I think I understand how. The ocrd_all build still uses Python 3.6, for which there simply is no newer Pytorch release. And most of my experiments were on systems with CUDA, which needs recompiling anyway.
So, as a workaround, do:
. /home/ocrdadmin/ocrd-3.7/sub-venv/headless-tf1/bin/activate
pip install -f https://download.pytorch.org/whl/cpu torch==1.10.1 torchvision==0.11.2
I'll reintroduce the torch<1.11
clause, so at least the prebuilt detectron2 wheels can still be used on the platforms which are supported.
Again, shame on the detectron2 devs for not making this dependency explicit.
Alas,
ERROR: Could not find a version that satisfies the requirement torch<1.11,>=1.10.0 (from versions: 1.13.0+cu117)
IOW, I cannot simply require <1.11
, because for newer CUDA versions, there simply are no older Pytorch releases, and recompiling Pytorch is out of the question here.
So I guess I'll have to force recompiling detectron2 anyway.