johnolafenwa/DeepStack

Deepstack gpu docker image timeout

Opened this issue · 18 comments

System:
Debian Buster 10
Backports buster linux drivers
GTX 960 GPU

Hey there, I'm using the latest image from docker hub deepquestai/deepstack:gpu.
After following guide here, I manged to launch deepstack:gpu container but everytime I send an image for detection I get timeout error.

{'success': False, 'error': 'failed to process request before timeout', 'duration': 0}

Steps I took:

  • Installed latest docker, nvidia-docker2 and deepstack:gpu docker image
  • Started container with sudo docker run --gpus all -e VISION-DETECTION=True -v localstorage:/datastore -p 5000:5000 deepquestai/deepstack:gpu
  • Tried to send image from here with same python code to running container to localhost
  • Got timeout error after 1 min

More info:

sudo nvidia-docker run --name=deepstack --gpus all -e MODE=High -e VISION-DETECTION=True -v deepstack:/datastore -p 5000:5000 deepquestai/deepstack:gpu
DeepStack: Version 2021.02.1
/v1/vision/detection
---------------------------------------
---------------------------------------
v1/backup
---------------------------------------
v1/restore
[GIN] 2021/04/02 - 22:05:09 | 500 |          1m0s |      172.17.0.1 | POST     /v1/vision/detection

Host Nvidia SMI

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 960     On   | 00000000:07:00.0 Off |                  N/A |
|  7%   45C    P8    14W / 130W |      1MiB /  2000MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

root@21951afe7542:/app/server# cat logs/stderr.txt exit status 1chdir intelligencelayer\shared: The system cannot find the path specified

root@21951afe7542:/app/server# cat ../logs/stderr.txt 
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/app/intelligencelayer/shared/detection.py", line 69, in objectdetection
    detector = YOLODetector(model_path, reso, cuda=CUDA_MODE)
  File "/app/intelligencelayer/shared/./process.py", line 36, in __init__
    self.model = attempt_load(model_path, map_location=self.device)
  File "/app/intelligencelayer/shared/./models/experimental.py", line 159, in attempt_load
    torch.load(w, map_location=map_location)["model"].float().fuse().eval()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 584, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 842, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 834, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 823, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 803, in restore_location
    return default_restore_location(storage, str(map_location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 174, in default_restore_location
    result = fn(storage, location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 150, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 134, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Same. Shame as it has been solid for months.

I wonder when there is going to be an update? It has been two months since a checkin of this project. It is a great project and it would be a shame to see it abandoned!

Hello @rickx34 @Kosh42 @gillonba

Thanks for reporting this. Sorry we have not been able to attend to issues for a while now. We have an update to DeepStack coming this month.

On this issue, it appears DeepStack is unable to detect the gpu. Also, i notice from the results of nvidia-smi above that cuda version is N/A (CUDA Version: N/A )

Did you attempt to install cuda and what version of cuda was installed?

@johnolafenwa I think the output of nvidia-smi is from host, I presume the docker image has cuda installed, I can nvidia-smi within the docker image and can get cuda version

Folks, its a shame but we have to update the docs for the docker on linux...
When you run a docker with GPU on linux you have to pass --privileged parameter so the container can access NVIDIA devices on the host. You can also mess with --device param but the quickest way would be just --privileged.
docker run --gpus all --privileged ...<rest of the parameters>

I'm having this exact problem and same error on debian 11 but haven't been able to get past it. I tried --privileged as well.

Have CPU working for all three VISION-SCENE, VISION-DETECTION, VISION-FACE. Really nice work!

Now with GPU option only VISION-SCENE, VISION-DETECTION are working. The VISION-FACE is timing out:

[GIN] 2022/03/18 - 22:53:21 | 500 | 1m0s | 172.17.0.1 | POST "/v1/vision/face/"

docker run --gpus all --privileged -e VISION-FACE=True -v /mnt/user/security/datastore:/datastore -p 5000:5000 deepquestai/deepstack:gpu-2022.01.1

Also tried deepquestai/deepstack:gpu-x5-beta with the same result.

Running Intel Core i5-6500 and GeForce GTX 1050 on Ubuntu 20.04 LTS, downloaded today, fresh install.

Cuda working inside docker as below is test / output:

docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
NVIDIA-SMI 510.54
Driver Version: 510.54
CUDA Version: 11.6

Simple install code to make it quick to replicate, also includes python test scripts

install-notes.txt
python.zip
.

Also installed Nvidia cudnn8 with same timeout happening with DeepStack GPU Face, below steps taken..

OS="ubuntu2004"
sudo apt-get update
get https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin
sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /"
apt search libcudnn
apt-get install libcudnn8 libcudnn8-dev

must have been the memory of the graphics card, all working on the 1080ti which has 11GB of memory

Hi,

gpu-2022.01.1 does not work for me on any endpoints. I get a timeout after 1m.

gpu-2021.09.1 works for me on every endpoints even without --privileged.

I have a GeForce GTX 1650 4G.

Here's my nvidia-smi on gpu-2022.01.1:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0 Off |                  N/A |
| 36%   40C    P8     7W /  75W |      0MiB /  3909MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Here's my nvidia-smi on gpu-2021.09.1:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0 Off |                  N/A |
| 37%   43C    P0    21W /  75W |   3072MiB /  3909MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

EDIT: notice the memory is 0MiB / 3909MiB on the gpu-2022.01.1... so nothing has been loaded I guess.

Same issue for me with a GTX 750Ti, works perfectly with gpu-2021.09.1 but not with gpu-2022.01.1

Any update to this?

Just an FYI, I ran across the same issue after redis crashed on the docker system I was running on. Probably not the most common cause, but the timeouts happened the same and tailing /app/logs/stderr.txt in the container revealed the issue.

So 1y and half later and no news on something like gpu usage on a image recognizing software.

Any news on this subject ?

Any news on this subject ?

The project has been dead for over two years. You can switch to CodeProject AI. https://www.codeproject.com/AI/docs/

Oh thank you for pointing me to the successor.
Just a quick question, I see :

The Docker GPU version is specific to nVidia's CUDA enabled cards with compute capability >= 6.0

So my graphic card with compute compatibility 3.0 is useless with this project I guess ? Just to be sure if there is a way to use it anyway or not ?

I use the windows version so I can't speak to specifics. But, I'd grab the Docker CPU version, then once it's installed there are multiple modules you can use that will operate on older GPUs. Within CPAI are multiple types of processing modules available to install and use for image processing (and a few for sound, facial recognition, text process like license plates, etc.).