ThilinaRajapakse/simpletransformers

Loading a fine tuned Seq2Seq MarianMT model gives wrong predictions

Closed this issue · 3 comments

i initialized and trained the following model:

model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="Helsinki-NLP/opus-mt-en-mul",
    args=model_args,
    use_cuda=True,
)

After training, model.predict(['this is a test']) gives me desired output.
However, when I loaded back this model to make prediction. The output is off:

from transformers import MarianMTModel
my_model = MarianMTModel.from_pretrained('outputs/best_model')

translated = my_model.generate(**tokenizer(['this is a test'], return_tensors="pt", padding=True))
[tokenizer.decode(t, skip_special_tokens=True) for t in translated]

Anything i missed?

Do you get any warnings when you reload the model? (Set up logging if you haven't: logging.basicConfig(level=logging.INFO))

Does it work as expected if you reload the model with Simple Transformers and use model.predict()?

Do you get any warnings when you reload the model? (Set up logging if you haven't: logging.basicConfig(level=logging.INFO))

Does it work as expected if you reload the model with Simple Transformers and use model.predict()?

I did get the warning saying that not all weights are initialized when loading the model using MarianMTModel.from_pretrained('outputs/best_model').
Could you say a bit more about how to reload the model (PATH='outputs/best_model/') with Simple Transformer (I assume it will be using Seq2SeqModel)? Is Seq2SeqModel.from_pretrained(<PATH>) supported?

To load with ST, you'd do:

model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="<PATH>",
    args=model_args,
    use_cuda=True,
)

In theory, Seq2SeqModel.from_pretrained(<PATH>) is also supported since ST uses a Huggingface model under the hood. I don't remember this exactly, but maybe Marian encoder-decoder models are a special case where this doesn't work (due to how the encoder and the decoder are set up).