[ChatLLaMA] No GPU Detected issue

Question

[ChatLLaMA] No GPU Detected issue

MuffinC opened this issue 2 years ago · 2 comments

When trying to run the training step python artifacts/main.py artifacts/config/config.yaml --type ALL
It replies with ValueError("No Gpu available") . Is there anyone with advice on this? Currently trying to run in a azure cloud gpu vm. The gpu is NVIDIA Corporation GP100GL [Tesla P100 PCIe 16gb]. If there is any more information required please do reach out thanks!

Answer 1 · 2023-03-27T11:54:39.000Z

Hi @MuffinC, thank you for reaching out. It looks like there is something missing on the GPU setup. Could you please share with us the results of the two following commands?

nvidia-smi

and

python -c "import torch; print(torch.cuda.is_available())"

Answer 2 · 2023-03-28T11:18:37.000Z

Hi diego, thanks for leading me in the right direction. Managed to solve the issue by running the following commands:
apt-get remove --purge '^nvidia-.'
sudo apt-get install ubuntu-desktop
apt-get --purge remove "cublas" "cuda"
apt-get --purge remove "nvidia"
sudo rm /etc/X11/xorg.conf
sudo apt autoremove
reboot
ubuntu-drivers devices
ubuntu-drivers autoinstall
reboot
nvidia-smi

Initially nvidia-smi was showing and error, and torch.cuda was returning Falst. Now it returns true