Xirider/finetune-gpt2xl

Unable to proceed, no GPU resources available

bpm246 opened this issue · 3 comments

We are trying to run the model with our own server, and we have got this error:
RuntimeError: Unable to proceed, no GPU resources available

Hi!
This means deepspeed is unable to find your gpu. Maybe your cuda version doesn't work with pytorch or your drivers are not installed correctly. You should also have pytorch version 1.7.

Run these commands to check if pytorch recognizes your gpu correctly:
import torch
torch.cuda.is_available()
torch.cuda.current_device()
torch.cuda.get_device_name(0)

We have run your instructions:
import torch
torch.cuda.is_available()
and the result is False

We have pytorch 1.8.1 and CUDA 10.2. We have tried to install pytorch 1.7.0 but we got this error:
ERROR: Could not find a version that satisfies the requirement torch==1.7.0
ERROR: No matching distribution found for torch==1.7.0

It should also work with pytorch 1.8, you don't need to install pytorch 1.7.
If torch.cuda.is_available() is False, it means that pytorch can't detect your gpu. This propably means that your GPU or your graphics driver doesn't support the cuda version that you are using. Here is a relevant thread:
https://stackoverflow.com/questions/60987997/why-torch-cuda-is-available-returns-false-even-after-installing-pytorch-with