Prediction example?
glicerico opened this issue · 14 comments
Hi @macabdul9 , after training a model that reached 75% on the test set, I'd like to use it to predict Speech Acts for other data. Can you kindly offer an example on how to make inference with the stored checkpoing? So far haven't been able to process the new data (from a csv file) accordingly. I try using a dataloader but it doesn't seem to work.
Hi @glicerico,
Good to hear that you're getting reasonable performance.
For inference here are the simple steps you can follow:
- Create Lightning model as
model = LM()
# this will be your lightning model class - Load state dictionary from checkpoint as
model.load_state_dict(torch.load(checkpoint_path, map_location=your_device)["state_dict"])
- Set the required_grad=False and start making your prediction as
outputs = model(inputs).argmax(dim=-1)
For the second part of your issue regarding the dataloader, can you post the script here. Dataset class is generic you just have to give it a tokenizer and data along with fields to read also make sure that for test data you may not have label field so disable it.
Let me know if it helps.
Thanks for the answer.
My problem continues to be the input data, I cannot make it work.
I am now even using the same dataloaders that you set up to get the data splits, but when I try to use prediction on one of the batches, it fails.
My code is:
from config import config
import torch
from Trainer import LightningModel
import pytorch_lightning as pl
checkpoint_path='checkpoints/epoch=29-val_accuracy=0.748834.ckpt'
my_device = torch.device('cuda')
model = LightningModel(config=config)
model.load_state_dict(torch.load(checkpoint_path, map_location=my_device)['state_dict'])
test_dataloader = model.test_dataloader()
for batch in test_dataloader:
one_batch = batch
#one_batch.cuda()
with torch.no_grad():
outputs = model(one_batch).argmax(dim=-1)
print(outputs)
And it errors with:
Traceback (most recent call last):
File "predict.py", line 21, in <module> [0/1049]
outputs = model(one_batch).argmax(dim=-1)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/CASA-Dialogue-Act-Classifier/Trainer.py", line 44, in forward
logits = self.model(batch)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/CASA-Dialogue-Act-Classifier/models/ContextAwareDAC.py", line 63, in forward
m = self.context_aware_attention(hidden_states=x, h_forward=hx[0].detach())
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/CASA-Dialogue-Act-Classifier/models/ContextAwareAttention.py", line 27, in forward
S = self.fc_2(torch.tanh(self.fc_1(hidden_states) + self.fc_3(h_forward.unsqueeze(1))))
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 1692, in linear
output = input.matmul(weight.t())
RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)
If I uncomment #one_batch.cuda()
, then the error says one_batch
doesn't have a cuda()
metod.
Perhaps I should get a batch differntly? But if I feed the whole dataloader, it also fails, saying that Dataloader
object is not subscriptable.
You're getting this error because a batch also contains list and str data as well so you can't ship it to cuda unless it is a tensor and batch is a dictionary.
Inside the LightningModel there will be your model, ship input_ids
and attention_mask
from the batch to cuda and then feed it to your pure model using following steps:
- batch = next(iter(test_loader))
- input_ids = batch['input_ids'].to(your_device)
- attention_mask = batch['attention_mask'].to(your_device)
- outputs = model.model(input_ids=input_ids, attention_mask=attention_mask)
This should work if your main model takes input_ids and attention but if it fails you can try configuring device as cpu and your code will work then.
Note: Make sure your main model is taking input_ids and attention_mask only if doesn't, update the main model's forward function it should take only those params which it uses not the whole batch.
Thanks for the advice, it worked after a few more tweaks: I had to send the whole model to cuda, and indeed change the model definition to accept input_ids and mask separately.
For future reference, here's how the code ended up:
from config import config
import torch
from Trainer import LightningModel
import pytorch_lightning as pl
checkpoint_path='checkpoints/epoch=29-val_accuracy=0.748834.ckpt'
my_device = torch.device('cuda')
model = LightningModel(config=config)
model = model.to(my_device)
model.load_state_dict(torch.load(checkpoint_path, map_location=my_device)['state_dict'])
test_dataloader = model.test_dataloader()
batch = next(iter(test_dataloader))
input_ids = batch['input_ids'].to(my_device)
attention_mask = batch['attention_mask'].to(my_device)
seq_len= batch['seq_len'].to(my_device)
with torch.no_grad():
outputs = model.model(input_ids=input_ids, attention_mask=attention_mask, seq_len=seq_len).argmax(dim=-1)
print(batch['text'])
print(batch['target'])
print(outputs)
Now, there's still the problem that the labels don't seem to match in the prediction as the labels in the batch.
I see a similar pattern, so they are probably alright, but the classes seem to be following different numbering. I bet there must be a straightforward way to convert the classes to labels. Do you know it?
I've noticed the DADataset
constructor builds the class label dictionary from a set of values, which is unordered. Sorting the values would be a reproducible/safer alternative.
On another note, would anyone be willing to share their trained model/checkpoint file and link it to this repository? The training process is computationally heavy and I've only seen accuracy scores up to 65% on the test set.
Oh, that makes sense and that is probably why the labels are different when I run prediction.
I can share the model I trained with 75% accuracy, I'll upload and share the link here.
@Christopher-Thornton, are you familiar on how to predict for new data once the model is trained? I am sure there must be some simpler methodology with this pytorch lighning, than what I have been doing
@glicerico I'm not too familiar with this framework, the standard way to run inferences is to convert the model to ONNX and use onnxruntime (from docs).
After tinkering with the code for a bit on my own I had issues passing multiple tensors to the input_sample
parameter, as mentioned in pytorch/issues/22488. It's possible to work around this but requires changes to model definitions.
@Christopher-Thornton here's the checkpoint for my trained model: https://www.dropbox.com/s/y42bw6qmoa9b8k2/epoch%3D29-val_accuracy%3D0.748834.ckpt?dl=0
Let me know how it works for you
@Christopher-Thornton , I just saw that @macabdul9 commited the change you suggested to fix the class order.
I am not sure if this will make the above checkpoint unusable
@glicerico Yes, you are right if you're making predictions after training it by creating a new LM object and loading from checkpoint then it will be similar to random prediction because the order of label_dict will be different each time so you should have saved the label_dict. No I've fixed this so it should be same.
@glicerico, I'm trying to run an inference code:
...
outputs = model.model(input_ids=input_ids, attention_mask=attention_mask, seq_len=seq_len).argmax(dim=-1)
...
but I get the following error:
TypeError: forward() got an unexpected keyword argument 'input_ids'
Do you know what could be wrong? It would be nice to add a small test/inference script to the repo. Thanks
@Christopher-Thornton here's the checkpoint for my trained model: https://www.dropbox.com/s/y42bw6qmoa9b8k2/epoch%3D29-val_accuracy%3D0.748834.ckpt?dl=0 Let me know how it works for you
Hello @glicerico look like the link is not available now, can you please share the checkpoint again? :)