ThilinaRajapakse/simpletransformers

Can't load previously trained GPT-2 Language generation model

timmartin opened this issue · 0 comments

Describe the bug
I trained a GPT-2 model from scratch using LanguageModelingModel. This was saved to disk. I then started a new process and tried to load it, and it reported:

RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
    size mismatch for transformer.wte.weight: copying a param with shape torch.Size([375, 768]) from checkpoint, the shape in current model is torch.Size([10000, 768]).
    size mismatch for lm_head.weight: copying a param with shape torch.Size([375, 768]) from checkpoint, the shape in current model is torch.Size([10000, 768]).
    You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

To Reproduce
Generate a model using the train_new_lm.py script shipped in the examples directory. Try to load the model with:

from simpletransformers.language_modeling import LanguageModelingModel

model = LanguageModelingModel(
    "gpt2",
    "./outputs/from_scratch/best_model",
)

Expected behavior
No exception.

Desktop (please complete the following information):

  • Linux