Nvidia Gpu not detected , driver not found?

Question

Nvidia Gpu not detected , driver not found?

sistemasITI opened this issue 9 months ago · 13 comments

Hi, I've tested in 2 different environments and none of them get that detects the GPU. The first environment is with an Nvidia A100 and the second with an Nvidia rtx 3060 (both on windows 11) I have the latest nvidia drivers installed. What am I missing or what am I doing wrong?

this is the error code:

[2024-01-15T14:51:35.519Z] [INFO] Extenension: Invoking validateEnvironement for: nvidia-driver Debug: validate-env[0] 03:51:35.60 0 ExecuteAsync Started Information: validate-env[0] 03:51:35.67 0 IsNvidiaDiverAvailable Execution Information: validate-env[0] 03:51:35.73 0 IsNvidiaDiverAvailable : False Debug: validate-env[0] 03:51:35.73 0 ExecuteAsync Completed Elapsed:00:00:00.1431411

Answer 1 · 2024-01-16T18:41:45.000Z

Hi @sistemasITI , did you click the button "Setup WSL Environment"? It will install CUDA and Conda for WSL then re-detect the environment.

Answer 2 · 2024-01-16T23:50:54.000Z

@sistemasITI at what stage the GPU is not being detected. Here's a screenshot after invoking the plugin:

Are you not getting the NVIDIA GPU detected in the prerequisites output?

Since your setup is failing with IsNvidiaDiverAvailable : False are you using a driver with wsl support? What's your nvidia driver version?

Answer 3 · 2024-01-17T07:47:23.000Z

Hi, here a screenshot of the status:

This is the screenshot of the server with a Nvidia A100 (but in the pc with nvidia 3060 is the same result)
As you can see, the "Setup WSL environment" is disabled.
conda is also not detected (but it is installed)

The drivers are the latest available on nvidia website.

Answer 4 · 2024-01-18T01:32:18.000Z

Could you share the most recent *.cli.log in %USERPROFILE%.wais on Windows, for example 20240104-260872-cli.log?

These logs will provide additional information on the check.

Answer 5 · 2024-01-18T08:15:50.000Z

Could you share the most recent *.cli.log in %USERPROFILE%.wais on Windows, for example 20240104-260872-cli.log?

These logs will provide additional information on the check.

Here is:

`Debug: validate-env[0]
03:45:40.98 0 ExecuteAsync Started
Information: validate-env[0]
03:45:41.46 0 IsWSLDetected Execution
Error: validate-env[0]
03:45:54.27 0 Error: No LSB modules are available.

Information: validate-env[0]
03:45:54.27 0 The default WSL distribution is Ubuntu 18.04 or greater.
Information: validate-env[0]
03:45:54.27 0 IsNvidiaDiverAvailable Execution
Information: validate-env[0]
03:45:54.78 0 IsNvidiaDiverAvailable : False
Debug: validate-env[0]
03:45:54.78 0 ExecuteAsync Completed Elapsed:00:00:13.7974470
Debug: validate-env[0]
03:45:55.47 0 ExecuteAsync Started
Information: validate-env[0]
03:45:55.72 0 IsCondaInstalled Execution
Information: validate-env[0]
03:45:56.14 0 IsCondaInstalled : False
Debug: validate-env[0]
03:45:56.14 0 ExecuteAsync Completed Elapsed:00:00:00.6724627
Debug: validate-env[0]
03:45:56.25 0 ExecuteAsync Started
Information: validate-env[0]
03:45:56.49 0 IsCudaRuntimeInstalled Execution
Information: validate-env[0]
03:45:56.86 0 IsCudaRuntimeInstalled : True
Debug: validate-env[0]
03:45:56.86 0 ExecuteAsync Completed Elapsed:00:00:00.6119740
Debug: validate-env[0]
03:45:56.96 0 ExecuteAsync Started
Information: validate-env[0]
03:45:57.19 0 IsNvidiaDiverAvailable Execution
Information: validate-env[0]
03:45:57.57 0 IsNvidiaDiverAvailable : False
Debug: validate-env[0]
03:45:57.57 0 ExecuteAsync Completed Elapsed:00:00:00.6160153
Debug: validate-env[0]
03:45:57.68 0 ExecuteAsync Started
Information: validate-env[0]
03:45:57.91 0 IsWSLDetected Execution
Debug: validate-env[0]
03:45:58.08 0 ExecuteAsync Completed Elapsed:00:00:00.3982375
Debug: validate-env[0]
03:45:58.17 0 ExecuteAsync Started
Error: validate-env[0]
03:45:59.00 0 Error: No LSB modules are available.

Debug: validate-env[0]
03:45:59.00 0 ExecuteAsync Completed Elapsed:00:00:00.8291484
Debug: validate-env[0]
03:45:59.09 0 ExecuteAsync Started
Error: validate-env[0]
03:45:59.91 0 Error: No LSB modules are available.

Information: validate-env[0]
03:45:59.91 0 The default WSL distribution is Ubuntu 18.04 or greater.
Debug: validate-env[0]
03:45:59.91 0 ExecuteAsync Completed Elapsed:00:00:00.8202500`

Answer 6 · 2024-01-29T07:35:07.000Z

Hello, some suggestions?

Answer 7 · 2024-01-29T17:24:41.000Z

Hi @sistemasITI , could you try install Nvidia driver for Windows from the official drop below?

https://www.nvidia.com/Download/index.aspx?lang=en-us

Answer 8 · 2024-01-30T08:38:40.000Z

Hi @sistemasITI , could you try install Nvidia driver for Windows from the official drop below?

https://www.nvidia.com/Download/index.aspx?lang=en-us

Hello, I already had them installed, but I have reinstalled on the 3 computers and the card is still not detected on the 3 computers. What is happening? Why is the card not detected in any computer?

Answer 9 · 2024-01-30T16:41:24.000Z

The extension runs nvidia-smi.exe to detect NV GPU. Could you run nvidia-smi in a windows console and see if it outputs something like below?

It could be that nvidia-smi.exe may not be in the system environment path.

Answer 10 · 2024-01-30T17:29:55.000Z

@ningx-ms how is the Nvidia GPU detected/selected when the main display is iGPU?

On a notebook with iGPU as main display the Nvidia GPU is not being seen by the plugin:

The nvidia-smi reports one GPU and no excluded devices:

PS C:\Users\elsaco> nvidia-smi -L
GPU 0: Quadro P1000 (UUID: GPU-1bfef509-e89e-9fef-e986-8979dab8e22a)
PS C:\Users\elsaco> nvidia-smi -B
No excluded devices found.

Answer 11 · 2024-01-30T17:57:32.000Z

The extension runs nvidia-smi.exe to detect NV GPU. Could you run nvidia-smi in a windows console and see if it outputs something like below?

It could be that nvidia-smi.exe may not be in the system environment path.

nvidia-smi works fine, I also executed the two commands @elsaco say:

I don't know what is happening :(

Answer 12 · 2024-06-04T15:13:33.000Z

Hi So I have seen this issue before

Check cuDNN Installation:

First run updates on all packages

sudo apt update

sudo apt upgrade

Ensure that you have installed cuDNN correctly. You can download the cuDNN library from the NVIDIA website and follow the installation guide.

Install cuda drivers and onnxruntime

pip install onnxruntime

pip install onnxruntime-gpu

Make sure the library is in the expected location (usually /usr/local/cuda/lib64).

Check LD_LIBRARY_PATH:Set the LD_LIBRARY_PATH environment variable to include the directory containing libcudnn.so.8. For example:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Verify CUDA Toolkit Version: Confirm that your installed CUDA Toolkit version matches the version expected by TensorFlow. You might need to adjust the CUDA version in your TensorFlow code or install a compatible version of cuDNN.

If you then get a error saying a specific version is missing i.e. libcudnn8 I recommend you manually install

Find if the library is installed

find / -type f -name "libcudnn.so.8" 2>/dev/null

Failed loading model mistral-7b-v02-int4-gpu: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

You can then reinstall the specific version

sudo apt-get install libcudnn8

Answer 13 · 2024-06-20T21:37:43.000Z

I'm having this same issue and I'm a bit confused by this last instruction. You are listing a bunch of Linux commands but the issue is on a Windows machine. Do all of these NVIDIA packages need to be installed on both Windows and WSL? When I run nvidia-smi on cmd it works fine, but when I run it in WSL I get an error that it cannot communicate with the NVIDIA driver. When it is checking for a valid GPU, does it check by running nvidia-smi in cmd/ps or wsl?