Shifts-Project/shifts

[TRANSLATION] Encoder and decoder weight dimension mismatches when using the pre-trained models

Closed this issue · 3 comments

For the translation task, when using the pre-trained models, for example, model1.pt, we get a RuntimeError:

RuntimeError: Error(s) in loading state_dict for TransformerModel:
        size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([43768, 1024]) from checkpoint, the shape in current model is torch.Size([33536, 1024]).
        size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([48272, 1024]) from checkpoint, the shape in current model is torch.Size([41808, 1024]).

This is because the dictionary that we obtain from our preprocessing has cardinality that does not match what is expected by the pre-trained models.

Invocation:

python3 structured-uncertainty/generate.py data-bin/wmt20_en_ru/ --path baseline-models/model1.pt --max-tokens 4096 --remove-bpe --nbest 5 --gen-subset test --source-lang en --target-lang ru

A possible way forward might be to provide the training code that produced the pre-trained baselines, such that we can adapt it.

cc @KaosEngineer

No, this is because the preprocessing wasn't done right. I need to clean up the script here, so that it is clear which repo to use for what.

the preprocessing wasn't done right

Yeah, fixing the preprocessing such that the given pre-trained models would work sounds good too.

Another option is to share the baseline training code such that it could work with our data preprocessing.

We've fixed the preprocessing scripts and added the training data to the repo.