[BUG] Cuda not working
kanjieater opened this issue · 11 comments
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
I'm using gpu-2.0.0-ls31, and just doing a basic call to the whisper library like i did on an older version (digging through to see what version I had been using 3 months ago). Seems the 3.12 python update has caused some issues.
docker run -it --rm --name subplz --gpus all -v /mnt/d/sync:/sync -v /home/ke/code/subplz/SyncCache:/app/SyncCache subplz:latest sync -d "/sync/test"
/lsiopy/lib/python3.12/site-packages/torch/cuda/init.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Expected Behavior
faster-whisper calls should be able to use the gpu on the gpu images
Steps To Reproduce
- docker pull kanjieater/subplz
- docker run -it --rm --name subplz --gpus all -v /mnt/d/sync:/sync -v /home/ke/code/subplz/SyncCache:/app/SyncCache subplz:latest sync -d "/sync/test"
Environment
- OS: Ubuntu 24:24
- How docker service was installed: WSL2 on Win 11
CPU architecture
x86-64
Docker creation
1. docker pull kanjieater/subplz
2. docker run -it --rm --name subplz --gpus all -v /mnt/d/sync:/sync -v /home/ke/code/subplz/SyncCache:/app/SyncCache subplz:latest sync -d "/sync/test"
Container logs
➜ nvidia-smi
Sun Aug 11 19:49:48 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.27 Driver Version: 560.70 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 Ti On | 00000000:01:00.0 On | N/A |
| 0% 61C P0 47W / 285W | 4981MiB / 12282MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
➜ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Uploading soon after a build...
Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.
This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.
Can we reopen this?
the container you've reported an issue with is nothing to do with us.
You can reproduce on gpu-2.0.0-ls31
Again, kanjieater/subplz
is nothing to do with us.
- Then show your tests with using our actual container and not what I'm assuming is when it's being used as a base image.
- it could likely be a WSL2 issue which we don't test our containers on.
Ok sounds good. i'll get more logs from a fresh version, but this is the error from the container
/lsiopy/lib/python3.12/site-packages/torch/cuda/init.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
This looks like your custom code is installing python dependencies for cuda10.
This issue is locked due to inactivity