Prediction example?

Question

Prediction example?

glicerico opened this issue 4 years ago · 14 comments

Hi @macabdul9 , after training a model that reached 75% on the test set, I'd like to use it to predict Speech Acts for other data. Can you kindly offer an example on how to make inference with the stored checkpoing? So far haven't been able to process the new data (from a csv file) accordingly. I try using a dataloader but it doesn't seem to work.

Answer 1 · 2021-01-22T12:00:18.000Z

Hi @glicerico,

Good to hear that you're getting reasonable performance.

For inference here are the simple steps you can follow:

Create Lightning model as model = LM() # this will be your lightning model class
Load state dictionary from checkpoint as model.load_state_dict(torch.load(checkpoint_path, map_location=your_device)["state_dict"])
Set the required_grad=False and start making your prediction as outputs = model(inputs).argmax(dim=-1)

For the second part of your issue regarding the dataloader, can you post the script here. Dataset class is generic you just have to give it a tokenizer and data along with fields to read also make sure that for test data you may not have label field so disable it.

Let me know if it helps.

Answer 2 · 2021-01-23T01:13:15.000Z

Thanks for the answer.
My problem continues to be the input data, I cannot make it work.
I am now even using the same dataloaders that you set up to get the data splits, but when I try to use prediction on one of the batches, it fails.

My code is:

from config import config
import torch

from Trainer import LightningModel

import pytorch_lightning as pl

checkpoint_path='checkpoints/epoch=29-val_accuracy=0.748834.ckpt'
my_device = torch.device('cuda')

model = LightningModel(config=config)
model.load_state_dict(torch.load(checkpoint_path, map_location=my_device)['state_dict'])

test_dataloader = model.test_dataloader()
for batch in test_dataloader:
    one_batch = batch

#one_batch.cuda()
with torch.no_grad():
    outputs = model(one_batch).argmax(dim=-1)
    print(outputs)

And it errors with:

Traceback (most recent call last):                                                                                                                                           
  File "predict.py", line 21, in <module>                                                                                                                            [0/1049]
    outputs = model(one_batch).argmax(dim=-1)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/CASA-Dialogue-Act-Classifier/Trainer.py", line 44, in forward
    logits  = self.model(batch)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/CASA-Dialogue-Act-Classifier/models/ContextAwareDAC.py", line 63, in forward
    m = self.context_aware_attention(hidden_states=x, h_forward=hx[0].detach())
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/CASA-Dialogue-Act-Classifier/models/ContextAwareAttention.py", line 27, in forward
    S = self.fc_2(torch.tanh(self.fc_1(hidden_states) + self.fc_3(h_forward.unsqueeze(1))))
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/root/CASA-Dialogue-Act-Classifier/venv/lib/python3.7/site-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

If I uncomment #one_batch.cuda(), then the error says one_batch doesn't have a cuda() metod.
Perhaps I should get a batch differntly? But if I feed the whole dataloader, it also fails, saying that Dataloader object is not subscriptable.

Answer 3 · 2021-01-23T07:33:27.000Z

You're getting this error because a batch also contains list and str data as well so you can't ship it to cuda unless it is a tensor and batch is a dictionary.
Inside the LightningModel there will be your model, ship input_ids and attention_mask from the batch to cuda and then feed it to your pure model using following steps:

- batch = next(iter(test_loader))
- input_ids = batch['input_ids'].to(your_device)
- attention_mask =  batch['attention_mask'].to(your_device)
- outputs = model.model(input_ids=input_ids, attention_mask=attention_mask)

This should work if your main model takes input_ids and attention but if it fails you can try configuring device as cpu and your code will work then.

Note: Make sure your main model is taking input_ids and attention_mask only if doesn't, update the main model's forward function it should take only those params which it uses not the whole batch.

Answer 4 · 2021-01-24T07:59:01.000Z

Thanks for the advice, it worked after a few more tweaks: I had to send the whole model to cuda, and indeed change the model definition to accept input_ids and mask separately.

For future reference, here's how the code ended up:

from config import config
import torch

from Trainer import LightningModel

import pytorch_lightning as pl

checkpoint_path='checkpoints/epoch=29-val_accuracy=0.748834.ckpt'
my_device = torch.device('cuda')

model = LightningModel(config=config)
model = model.to(my_device)
model.load_state_dict(torch.load(checkpoint_path, map_location=my_device)['state_dict'])

test_dataloader = model.test_dataloader()
batch = next(iter(test_dataloader))
input_ids = batch['input_ids'].to(my_device)
attention_mask =  batch['attention_mask'].to(my_device)
seq_len= batch['seq_len'].to(my_device)

with torch.no_grad():
    outputs = model.model(input_ids=input_ids, attention_mask=attention_mask, seq_len=seq_len).argmax(dim=-1)
    print(batch['text'])
    print(batch['target'])
    print(outputs)

Answer 5 · 2021-01-24T08:01:12.000Z

Now, there's still the problem that the labels don't seem to match in the prediction as the labels in the batch.
I see a similar pattern, so they are probably alright, but the classes seem to be following different numbering. I bet there must be a straightforward way to convert the classes to labels. Do you know it?

Answer 6 · 2021-01-26T17:33:02.000Z

I've noticed the DADataset constructor builds the class label dictionary from a set of values, which is unordered. Sorting the values would be a reproducible/safer alternative.

On another note, would anyone be willing to share their trained model/checkpoint file and link it to this repository? The training process is computationally heavy and I've only seen accuracy scores up to 65% on the test set.

Answer 7 · 2021-01-26T19:11:44.000Z

Oh, that makes sense and that is probably why the labels are different when I run prediction.
I can share the model I trained with 75% accuracy, I'll upload and share the link here.
@Christopher-Thornton, are you familiar on how to predict for new data once the model is trained? I am sure there must be some simpler methodology with this pytorch lighning, than what I have been doing

Answer 8 · 2021-01-27T04:15:31.000Z

@glicerico I'm not too familiar with this framework, the standard way to run inferences is to convert the model to ONNX and use onnxruntime (from docs).
After tinkering with the code for a bit on my own I had issues passing multiple tensors to the input_sample parameter, as mentioned in pytorch/issues/22488. It's possible to work around this but requires changes to model definitions.

Answer 9 · 2021-01-28T05:37:38.000Z

@Christopher-Thornton here's the checkpoint for my trained model: https://www.dropbox.com/s/y42bw6qmoa9b8k2/epoch%3D29-val_accuracy%3D0.748834.ckpt?dl=0
Let me know how it works for you

Answer 10 · 2021-01-28T06:04:27.000Z

@Christopher-Thornton , I just saw that @macabdul9 commited the change you suggested to fix the class order.
I am not sure if this will make the above checkpoint unusable

Answer 11 · 2021-01-28T08:52:00.000Z

@glicerico Yes, you are right if you're making predictions after training it by creating a new LM object and loading from checkpoint then it will be similar to random prediction because the order of label_dict will be different each time so you should have saved the label_dict. No I've fixed this so it should be same.

Answer 12 · 2021-02-13T00:49:48.000Z

@glicerico, I'm trying to run an inference code:
...
outputs = model.model(input_ids=input_ids, attention_mask=attention_mask, seq_len=seq_len).argmax(dim=-1)
...
but I get the following error:
TypeError: forward() got an unexpected keyword argument 'input_ids'

Do you know what could be wrong? It would be nice to add a small test/inference script to the repo. Thanks

Answer 13 · 2021-02-13T08:01:28.000Z

@PolKul , please check PR #12, it has code to run inference. Also, the README has instructions on how to run it :)

Answer 14 · 2023-01-11T03:14:54.000Z

@Christopher-Thornton here's the checkpoint for my trained model: https://www.dropbox.com/s/y42bw6qmoa9b8k2/epoch%3D29-val_accuracy%3D0.748834.ckpt?dl=0 Let me know how it works for you

Hello @glicerico look like the link is not available now, can you please share the checkpoint again? :)