Why is the inference so slow?
Opened this issue · 1 comments
gyunggyung commented
Why is the inference so slow?
I used the code below, but I don't know if I'm using my GPU well.
device = torch.device('cpu')
model = GPT(C.model)
model.load_state_dict(torch.load(model_path))
model.to(device)
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
model.eval()
input_phrase = "Yesterday"
input_ids = tokenizer.encode(input_phrase, return_tensors='pt')
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=10, temperature=0.75, do_sample=True)
generated_text = tokenizer.decode(output[0])
print(f"Generated text: {generated_text}")
https://colab.research.google.com/drive/1I5n0SDrggPA8AnpucHuiI3jAEAQ4_KBh#scrollTo=XXwwVMgdW9-2
SoloWayG commented
Why is the inference so slow?
I used the code below, but I don't know if I'm using my GPU well.
device = torch.device('cpu') model = GPT(C.model) model.load_state_dict(torch.load(model_path)) model.to(device) tokenizer = GPT2TokenizerFast.from_pretrained('gpt2') model.eval() input_phrase = "Yesterday" input_ids = tokenizer.encode(input_phrase, return_tensors='pt') with torch.no_grad(): output = model.generate(input_ids, max_new_tokens=10, temperature=0.75, do_sample=True) generated_text = tokenizer.decode(output[0]) print(f"Generated text: {generated_text}")https://colab.research.google.com/drive/1I5n0SDrggPA8AnpucHuiI3jAEAQ4_KBh#scrollTo=XXwwVMgdW9-2
As i see you not use GPU by device = torch.device('cpu')
.
Try set device as
device = torch.device('cuda')
or even better
device = 'cuda' if torch.cuda.is_available() else 'cpu'