Can't load previously trained GPT-2 Language generation model
timmartin opened this issue · 0 comments
timmartin commented
Describe the bug
I trained a GPT-2 model from scratch using LanguageModelingModel
. This was saved to disk. I then started a new process and tried to load it, and it reported:
RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
size mismatch for transformer.wte.weight: copying a param with shape torch.Size([375, 768]) from checkpoint, the shape in current model is torch.Size([10000, 768]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([375, 768]) from checkpoint, the shape in current model is torch.Size([10000, 768]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
To Reproduce
Generate a model using the train_new_lm.py
script shipped in the examples
directory. Try to load the model with:
from simpletransformers.language_modeling import LanguageModelingModel
model = LanguageModelingModel(
"gpt2",
"./outputs/from_scratch/best_model",
)
Expected behavior
No exception.
Desktop (please complete the following information):
- Linux