[BUG] Cuda not working

Question

[BUG] Cuda not working

kanjieater opened this issue 5 months ago · 11 comments

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

I'm using gpu-2.0.0-ls31, and just doing a basic call to the whisper library like i did on an older version (digging through to see what version I had been using 3 months ago). Seems the 3.12 python update has caused some issues.

docker run -it --rm --name subplz --gpus all -v /mnt/d/sync:/sync -v /home/ke/code/subplz/SyncCache:/app/SyncCache subplz:latest sync -d "/sync/test"
/lsiopy/lib/python3.12/site-packages/torch/cuda/init.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0

Expected Behavior

faster-whisper calls should be able to use the gpu on the gpu images

Steps To Reproduce

docker pull kanjieater/subplz
docker run -it --rm --name subplz --gpus all -v /mnt/d/sync:/sync -v /home/ke/code/subplz/SyncCache:/app/SyncCache subplz:latest sync -d "/sync/test"

Environment

- OS: Ubuntu 24:24
- How docker service was installed:  WSL2 on Win 11

CPU architecture

x86-64

Docker creation

1. docker pull kanjieater/subplz
2. docker run -it --rm --name subplz --gpus all -v /mnt/d/sync:/sync -v /home/ke/code/subplz/SyncCache:/app/SyncCache subplz:latest sync -d "/sync/test"

Container logs

➜ nvidia-smi

Sun Aug 11 19:49:48 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.27                 Driver Version: 560.70         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   61C    P0             47W /  285W |    4981MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+


➜ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0


Uploading soon after a build...

Answer 1 · 2024-08-12T00:31:42.000Z

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

Answer 2 · 2024-09-11T10:53:35.000Z

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.

Answer 3 · 2024-11-10T16:07:15.000Z

Can we reopen this?

Answer 4 · 2024-11-10T16:12:24.000Z

the container you've reported an issue with is nothing to do with us.

Answer 5 · 2024-11-10T16:15:49.000Z

You can reproduce on gpu-2.0.0-ls31

Answer 6 · 2024-11-10T16:16:23.000Z

Again, kanjieater/subplz is nothing to do with us.

Answer 7 · 2024-11-10T16:17:52.000Z

I'm talking about running commands that use GPU on linuxserver/faster-whisper

Answer 8 · 2024-11-10T16:19:04.000Z

Then show your tests with using our actual container and not what I'm assuming is when it's being used as a base image.
it could likely be a WSL2 issue which we don't test our containers on.

Answer 9 · 2024-11-10T16:21:10.000Z

Ok sounds good. i'll get more logs from a fresh version, but this is the error from the container

/lsiopy/lib/python3.12/site-packages/torch/cuda/init.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0

Answer 10 · 2024-11-10T16:32:29.000Z

This looks like your custom code is installing python dependencies for cuda10.

Answer 11 · 2024-12-11T11:00:15.000Z

This issue is locked due to inactivity