microsoft/vscode-ai-toolkit

Nvidia Gpu not detected , driver not found?

sistemasITI opened this issue · 13 comments

Hi, I've tested in 2 different environments and none of them get that detects the GPU. The first environment is with an Nvidia A100 and the second with an Nvidia rtx 3060 (both on windows 11) I have the latest nvidia drivers installed. What am I missing or what am I doing wrong?

this is the error code:

[2024-01-15T14:51:35.519Z] [INFO] Extenension: Invoking validateEnvironement for: nvidia-driver Debug: validate-env[0] 03:51:35.60 0 ExecuteAsync Started Information: validate-env[0] 03:51:35.67 0 IsNvidiaDiverAvailable Execution Information: validate-env[0] 03:51:35.73 0 IsNvidiaDiverAvailable : False Debug: validate-env[0] 03:51:35.73 0 ExecuteAsync Completed Elapsed:00:00:00.1431411

Hi @sistemasITI , did you click the button "Setup WSL Environment"? It will install CUDA and Conda for WSL then re-detect the environment.

@sistemasITI at what stage the GPU is not being detected. Here's a screenshot after invoking the plugin:

win-ai-validate-env

Are you not getting the NVIDIA GPU detected in the prerequisites output?

Since your setup is failing with IsNvidiaDiverAvailable : False are you using a driver with wsl support? What's your nvidia driver version?

Hi, here a screenshot of the status:
Captura desde 2024-01-17 08-40-53

This is the screenshot of the server with a Nvidia A100 (but in the pc with nvidia 3060 is the same result)
As you can see, the "Setup WSL environment" is disabled.
conda is also not detected (but it is installed)

The drivers are the latest available on nvidia website.

Could you share the most recent *.cli.log in %USERPROFILE%.wais on Windows, for example 20240104-260872-cli.log?

These logs will provide additional information on the check.

Could you share the most recent *.cli.log in %USERPROFILE%.wais on Windows, for example 20240104-260872-cli.log?

These logs will provide additional information on the check.

Here is:

`Debug: validate-env[0]
03:45:40.98 0 ExecuteAsync Started
Information: validate-env[0]
03:45:41.46 0 IsWSLDetected Execution
Error: validate-env[0]
03:45:54.27 0 Error: No LSB modules are available.

Information: validate-env[0]
03:45:54.27 0 The default WSL distribution is Ubuntu 18.04 or greater.
Information: validate-env[0]
03:45:54.27 0 IsNvidiaDiverAvailable Execution
Information: validate-env[0]
03:45:54.78 0 IsNvidiaDiverAvailable : False
Debug: validate-env[0]
03:45:54.78 0 ExecuteAsync Completed Elapsed:00:00:13.7974470
Debug: validate-env[0]
03:45:55.47 0 ExecuteAsync Started
Information: validate-env[0]
03:45:55.72 0 IsCondaInstalled Execution
Information: validate-env[0]
03:45:56.14 0 IsCondaInstalled : False
Debug: validate-env[0]
03:45:56.14 0 ExecuteAsync Completed Elapsed:00:00:00.6724627
Debug: validate-env[0]
03:45:56.25 0 ExecuteAsync Started
Information: validate-env[0]
03:45:56.49 0 IsCudaRuntimeInstalled Execution
Information: validate-env[0]
03:45:56.86 0 IsCudaRuntimeInstalled : True
Debug: validate-env[0]
03:45:56.86 0 ExecuteAsync Completed Elapsed:00:00:00.6119740
Debug: validate-env[0]
03:45:56.96 0 ExecuteAsync Started
Information: validate-env[0]
03:45:57.19 0 IsNvidiaDiverAvailable Execution
Information: validate-env[0]
03:45:57.57 0 IsNvidiaDiverAvailable : False
Debug: validate-env[0]
03:45:57.57 0 ExecuteAsync Completed Elapsed:00:00:00.6160153
Debug: validate-env[0]
03:45:57.68 0 ExecuteAsync Started
Information: validate-env[0]
03:45:57.91 0 IsWSLDetected Execution
Debug: validate-env[0]
03:45:58.08 0 ExecuteAsync Completed Elapsed:00:00:00.3982375
Debug: validate-env[0]
03:45:58.17 0 ExecuteAsync Started
Error: validate-env[0]
03:45:59.00 0 Error: No LSB modules are available.

Debug: validate-env[0]
03:45:59.00 0 ExecuteAsync Completed Elapsed:00:00:00.8291484
Debug: validate-env[0]
03:45:59.09 0 ExecuteAsync Started
Error: validate-env[0]
03:45:59.91 0 Error: No LSB modules are available.

Information: validate-env[0]
03:45:59.91 0 The default WSL distribution is Ubuntu 18.04 or greater.
Debug: validate-env[0]
03:45:59.91 0 ExecuteAsync Completed Elapsed:00:00:00.8202500`

Hello, some suggestions?

Hi @sistemasITI , could you try install Nvidia driver for Windows from the official drop below?

https://www.nvidia.com/Download/index.aspx?lang=en-us

Hi @sistemasITI , could you try install Nvidia driver for Windows from the official drop below?

https://www.nvidia.com/Download/index.aspx?lang=en-us

Hello, I already had them installed, but I have reinstalled on the 3 computers and the card is still not detected on the 3 computers. What is happening? Why is the card not detected in any computer?

The extension runs nvidia-smi.exe to detect NV GPU. Could you run nvidia-smi in a windows console and see if it outputs something like below?

image

It could be that nvidia-smi.exe may not be in the system environment path.

@ningx-ms how is the Nvidia GPU detected/selected when the main display is iGPU?

On a notebook with iGPU as main display the Nvidia GPU is not being seen by the plugin:

wsl_no_nvidia_gpu

The nvidia-smi reports one GPU and no excluded devices:

PS C:\Users\elsaco> nvidia-smi -L
GPU 0: Quadro P1000 (UUID: GPU-1bfef509-e89e-9fef-e986-8979dab8e22a)
PS C:\Users\elsaco> nvidia-smi -B
No excluded devices found.

The extension runs nvidia-smi.exe to detect NV GPU. Could you run nvidia-smi in a windows console and see if it outputs something like below?

image

It could be that nvidia-smi.exe may not be in the system environment path.

nvidia-smi works fine, I also executed the two commands @elsaco say:

Captura desde 2024-01-30 18-56-11

I don't know what is happening :(

Hi So I have seen this issue before

Check cuDNN Installation:

First run updates on all packages

sudo apt update
sudo apt upgrade 

Ensure that you have installed cuDNN correctly. You can download the cuDNN library from the NVIDIA website and follow the installation guide.

Install cuda drivers and onnxruntime

pip install onnxruntime 
pip install onnxruntime-gpu 

Make sure the library is in the expected location (usually /usr/local/cuda/lib64).

Check LD_LIBRARY_PATH:Set the LD_LIBRARY_PATH environment variable to include the directory containing libcudnn.so.8. For example:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Verify CUDA Toolkit Version: Confirm that your installed CUDA Toolkit version matches the version expected by TensorFlow. You might need to adjust the CUDA version in your TensorFlow code or install a compatible version of cuDNN.

If you then get a error saying a specific version is missing i.e. libcudnn8 I recommend you manually install

Find if the library is installed

find / -type f -name "libcudnn.so.8" 2>/dev/null

Failed loading model mistral-7b-v02-int4-gpu: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

You can then reinstall the specific version

sudo apt-get install libcudnn8

I'm having this same issue and I'm a bit confused by this last instruction. You are listing a bunch of Linux commands but the issue is on a Windows machine. Do all of these NVIDIA packages need to be installed on both Windows and WSL? When I run nvidia-smi on cmd it works fine, but when I run it in WSL I get an error that it cannot communicate with the NVIDIA driver. When it is checking for a valid GPU, does it check by running nvidia-smi in cmd/ps or wsl?