Enable sm_35 support in pytorch
CRCinAU opened this issue · 2 comments
I'm trying to get GPU processing working on my older 2Gb GeForce GT 710 - which I believe should be about as fast as a jetson nano...
When I try to run deepstack with GPU enabled, I get:
/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py:125: UserWarning:
NVIDIA GeForce GT 710 with CUDA capability sm_35 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce GT 710 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection
detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
File "/app/intelligencelayer/shared/./process.py", line 36, in __init__
self.model = attempt_load(model_path, map_location=self.device)
File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load
torch.load(w, map_location=map_location)["model"].float().fuse().eval()
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 485, in float
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 376, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 485, in <lambda>
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device
Is there a way to enable sm_35 support in pytorch used in these containers? I can't quite see where it gets set....
Here's the nvidia-smi
output from within the container:
root@7ea2aa82ebb1:/app/logs# nvidia-smi
Wed Dec 15 09:44:56 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 N/A | N/A |
| 33% 37C P0 N/A / N/A | 0MiB / 2002MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Supported versions seem to be:
root@7ea2aa82ebb1:~# python3 -c "import torch; print(torch.cuda.get_arch_list())"
['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75']
I've been testing more and more - and with this docker-compose.yaml file, tensorflow detects the GPU ok:
services:
test:
image: tensorflow/tensorflow:latest-gpu
command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
Output:
Creating tensor_test_1 ... done
Attaching to tensor_test_1
test_1 | 2021-12-15 12:53:15.611667: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
test_1 | To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
test_1 | 2021-12-15 12:53:15.632027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:15.651586: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:15.651933: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:16.198873: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:16.199203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:16.199453: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
test_1 | 2021-12-15 12:53:16.201678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 1672 MB memory: -> device: 0, name: NVIDIA GeForce GT 710, pci bus id: 0000:01:00.0, compute capability: 3.5
tensor_test_1 exited with code 0
I think I managed to get this working!
Firstly, this is on Ubuntu 20.04 - and we need to install python 3.7 - and do this all as the root user:
# add-apt-repository ppa:deadsnakes/ppa
# apt-get update
# apt-get install python3.7
Create a python 3.7 venv and activate it:
# python3.7 -m venv /root/python-3.7
# cd /root/python-3.7
# source bin/activate
Now we want to install a proper version of torch that includes the stuff we need. I chose the same version of torch that's used in the deepstack install:
# pip install torch==1.6.0+cu101 -f https://nelsonliu.me/files/pytorch/whl/torch_stable.html
Run deepstack and map in the alternative torch package:
# docker run --gpus all -e VISION-DETECTION=True -e VISION-FACE=True -v /root/python-3.7/lib/python3.7/site-packages/torch:/usr/local/lib/python3.7/dist-packages/torch -v localstorage:/datastore -p 5000:5000 deepquestai/deepstack:gpu
This will map in the alternative torch packages that in my case supports sm_35
Results:
[GIN] 2021/12/16 - 02:20:54 | 200 | 328.902025ms | 172.31.1.89 | POST "/v1/vision/detection"
[GIN] 2021/12/16 - 02:21:06 | 200 | 225.5783ms | 172.31.1.89 | POST "/v1/vision/detection"
[GIN] 2021/12/16 - 02:21:09 | 200 | 233.602927ms | 172.31.1.89 | POST "/v1/vision/detection"
Compared to running on a Jetson Nano 4Gb:
[GIN] 2021/12/16 - 02:14:04 | 200 | 278.531116ms | 172.31.1.89 | POST /v1/vision/detection
[GIN] 2021/12/16 - 02:14:06 | 200 | 292.32564ms | 172.31.1.89 | POST /v1/vision/detection
[GIN] 2021/12/16 - 02:14:07 | 200 | 270.695522ms | 172.31.1.89 | POST /v1/vision/detection
Output of nvidia-smi
from within the deepstack container:
# docker exec -ti admiring_germain nvidia-smi
Thu Dec 16 02:33:00 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 N/A | N/A |
| 33% 34C P8 N/A / N/A | 1178MiB / 2002MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+