Why is the inference so slow?

Question

Why is the inference so slow?

Opened this issue 9 months ago · 1 comments

I used the code below, but I don't know if I'm using my GPU well.

device = torch.device('cpu')

model = GPT(C.model)
model.load_state_dict(torch.load(model_path))
model.to(device)

tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
model.eval()
input_phrase = "Yesterday"
input_ids = tokenizer.encode(input_phrase, return_tensors='pt')

with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=10, temperature=0.75, do_sample=True)

generated_text = tokenizer.decode(output[0])
print(f"Generated text: {generated_text}")

https://colab.research.google.com/drive/1I5n0SDrggPA8AnpucHuiI3jAEAQ4_KBh#scrollTo=XXwwVMgdW9-2

Answer 1 · 2024-06-13T10:40:28.000Z

Why is the inference so slow?

I used the code below, but I don't know if I'm using my GPU well.

device = torch.device('cpu')

model = GPT(C.model)
model.load_state_dict(torch.load(model_path))
model.to(device)

tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
model.eval()
input_phrase = "Yesterday"
input_ids = tokenizer.encode(input_phrase, return_tensors='pt')

with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=10, temperature=0.75, do_sample=True)

generated_text = tokenizer.decode(output[0])
print(f"Generated text: {generated_text}")

https://colab.research.google.com/drive/1I5n0SDrggPA8AnpucHuiI3jAEAQ4_KBh#scrollTo=XXwwVMgdW9-2

As i see you not use GPU by device = torch.device('cpu').
Try set device as
device = torch.device('cuda')
or even better
device = 'cuda' if torch.cuda.is_available() else 'cpu'