lucidrains/vit-pytorch

Saving and loading model seems to be regressing to lower performance

aperiamegh opened this issue · 1 comments

Hi, it is a great experience working with your library so far. I wanted to ask what the best way is to save and load the model.

I save:

model = ViT(hyperparams)
train(model)
torch.save(model.state_dict(), save_loc + f"model_e{epoch+1}.pth")

I load:

model = ViT(hyperparams) # exact same
model.load_state_dict(torch.load(f"models/VITM/model_e13.pth"))

However, I get the loaded model has lower train and test scores on the exact same dataset. Are there other things that I will need to save that I am missing? What could be the reason for this?

I am using 0.1 dropouts, but I do not expect that to cause this discrepancy.

My eval setup had me just run the training part of the code with model.eval(). The loss.backward() and optimizer.step() weren't commented out because I thought it won't have an effect since I am running in eval mode. However this was inherently doing something weird because if I only evaluated on the test set things worked as expected.