MohamadZeina/Disco_Diffusion_Local

FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi'

Closed this issue · 6 comments

Good morning. I'm having trouble running the notebook. I have a feeling it is something simple but my Google-fu is not helping in this case:

image

hi, do you have Nvidia-smi installed?
Just try in a new line:
!nvidia-smi

If it is installed you should see a table with your Grafik card information.

May be this would be helpful

https://www.cyberciti.biz/faq/ubuntu-linux-install-nvidia-driver-latest-proprietary-driver/

BLUF: Going to look over the install directions and see if I missed anything.

!nvidia-smi
returns:
-bash: !nvidia: event not found

Followed the link and ran:
sudo apt install nvidia-driver-510 nvidia-dkms-510

Failed but suggested I run apt-get update
apt-get update failed due to permissions.

I used this article to switch my user to the root user and retried the update, success.

Still as the root user I retried apt install nvidia-driver-510 nvidia-dkms-510, success
then: hwinfo --gfxcard --short, failed, had to install it too with apt-get hwinfo.
retrying hwinfo --gfxcard --short works now but returns no output.

moved back into the notebook by running:
conda activate pytorch_110
and then:
jupyter notebook --no-browser

nvidia-smi is found now but gives a new error:
"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

Possible explanation:
"I have no solid evidence to base this on, but it's possible that WSL does not actually expose all of the hardware claimed by the Host to the WSL Ubuntu environment. This may be why you can't work with it, because you don't have direct PCI access like you would in an Ubuntu installation directly on-system, but are instead basically 'containerized' within Windows. The underlying WSL abstraction library for syscalls also may not be permitted to have that access either."

Specs:
Intel(R) Core(TM) i9-10900KF CPU @ 3.70GHz 3.70 GHz
32.0 GB (RAM)
Windows 10 Pro 21H2
NVIDIA GeForce RTX 3090

Looks like I went after WSL/Ubuntu from the wrong direction. I already had it installed from an earlier docker desktop installation. Tearing that down and trying over.

I have completely removed Ubuntu/WSL and started over.
nvidia-smi works now:
image

Unfortunately I'm running into a new issue:

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

image

Error persists with different drivers:
image

Making a new post since this one was solved by me uninstalling/reinstalling WSL.