[TRANSLATION] Encoder and decoder weight dimension mismatches when using the pre-trained models

Question

[TRANSLATION] Encoder and decoder weight dimension mismatches when using the pre-trained models

Closed this issue 3 years ago · 3 comments

For the translation task, when using the pre-trained models, for example, model1.pt, we get a RuntimeError:

RuntimeError: Error(s) in loading state_dict for TransformerModel:
        size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([43768, 1024]) from checkpoint, the shape in current model is torch.Size([33536, 1024]).
        size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([48272, 1024]) from checkpoint, the shape in current model is torch.Size([41808, 1024]).

This is because the dictionary that we obtain from our preprocessing has cardinality that does not match what is expected by the pre-trained models.

Invocation:

python3 structured-uncertainty/generate.py data-bin/wmt20_en_ru/ --path baseline-models/model1.pt --max-tokens 4096 --remove-bpe --nbest 5 --gen-subset test --source-lang en --target-lang ru

A possible way forward might be to provide the training code that produced the pre-trained baselines, such that we can adapt it.

cc @KaosEngineer

Answer 1 · 2021-08-06T12:50:50.000Z

No, this is because the preprocessing wasn't done right. I need to clean up the script here, so that it is clear which repo to use for what.

Answer 2 · 2021-08-06T13:18:32.000Z

the preprocessing wasn't done right

Yeah, fixing the preprocessing such that the given pre-trained models would work sounds good too.

Another option is to share the baseline training code such that it could work with our data preprocessing.

Answer 3 · 2021-08-25T07:01:46.000Z

We've fixed the preprocessing scripts and added the training data to the repo.