bertsky/ocrd_detectron2

detectron2 vs CUDA dependency

Closed this issue · 6 comments

It seems the latest versions we can get any detectron2 for are:

What a mess! So as with Tensorflow, older CUDA versions quickly tend to not get supported. It's not as bad regarding Python version ranges, but CUDA 10.0 – which still is OCR-D's main target platform for CUDA builds – is out.

CUDA version 11.2 (default for Debian stable) is also unsupported.

OCR-D CUDA docker images use CUDA version 11.3.

but CUDA 10.0 – which still is OCR-D's main target platform for CUDA builds – is out.

OCR-D CUDA docker images use CUDA version 11.3.

That's due to the changes I introduced since I wrote this. They support all CUDA versions.

CUDA version 11.2 (default for Debian stable) is also unsupported.

Yes, so it seems. I have two options now: add that case to the CPU fallbacks, or run with the Pytorch for that platform and introduce a fallback source build for Detectron2. I tend towards the latter, as it would also cover other platforms.

I just had a failing build on Debian stable with Python 3.7 and CUDA version 11.4. What about building from source as a fallback as you already suggested above? pip install 'git+https://github.com/facebookresearch/detectron2.git' works fine for me and does not take excessive time.

pip install 'git+https://github.com/facebookresearch/detectron2.git' works fine for me and does not take excessive time.

If it worked for you, that's sheer luck I'm afraid. I get:

Collecting scikit-image>=0.17.2
  Downloading scikit_image-0.19.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 82.5 MB/s eta 0:00:00
Collecting torch>=1.10.1
  Downloading https://download.pytorch.org/whl/cu117/torch-1.13.0%2Bcu117-cp38-cp38-linux_x86_64.whl (1806.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 GB 96.7 MB/s eta 0:00:00
Collecting torchvision>=0.11.2
  Downloading https://download.pytorch.org/whl/cu117/torchvision-0.14.0%2Bcu117-cp38-cp38-linux_x86_64.whl (24.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.3/24.3 MB 82.7 MB/s eta 0:00:00
ERROR: Could not find a version that satisfies the requirement detectron2>=0.6 (from versions: none)
ERROR: No matching distribution found for detectron2>=0.6
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in links: https://dl.fbaipublicfiles.com/detectron2/wheels/cu117/torch1.10/index.html, https://download.pytorch.org/whl/cu117/torch_stable.html
Collecting detectron2==0.6
  Cloning https://github.com/facebookresearch/detectron2 (to revision v0.6) to /tmp/pip-install-onelriyc/detectron2_aa9a864f0fa24ef58c4a7ee45be7edd2
  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2 /tmp/pip-install-onelriyc/detectron2_aa9a864f0fa24ef58c4a7ee45be7edd2
  Running command git checkout -q d1e04565d3bec8719335b88be9e9b961bf3ec464
  Resolved https://github.com/facebookresearch/detectron2 to commit d1e04565d3bec8719335b88be9e9b961bf3ec464
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-onelriyc/detectron2_aa9a864f0fa24ef58c4a7ee45be7edd2/setup.py", line 10, in <module>
          import torch
      ModuleNotFoundError: No module named 'torch'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

Facebook Research has obviously failed to set up the package in a way that either torch becomes a build-time dependency, or is not needed by setuptools:

https://github.com/facebookresearch/detectron2/blob/32bd159d7263683e39bf4e87e5c4ac88bad2fd73/setup.py#L10

EDIT: just found this issue describing the problem

I am at a loss what we should do going forward TBH.

a82513e should be sufficient. So #11 now is the fix – I hope 🤞