ize mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).

Question

ize mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).

Closed this issue 7 months ago · 4 comments

     你好，我训练epoch 608时，就训练结束了:

2023-07-02 01:57:38 | INFO | fairseq_cli.train | end of epoch 608 (average epoch stats below)
2023-07-02 01:57:38 | INFO | train | {"epoch": 608, "train_loss": "3.663", "train_nll_loss": "1.275", "train_diffusion": "0.424", "train_word_ins": "3.224", "train_length": "0.151", "train_ppl": "12.67", "train_bleu": "0", "train_wps": "23834.3", "train_ups": "3.3", "train_wpb": "7233.2", "train_bsz": "309.9", "train_num_updates": "300000", "train_lr": "9.12871e-05", "train_gnorm": "2.611", "train_clip": "97.3", "train_loss_scale": "16384", "train_train_wall": "25", "train_wall": "55974"}
2023-07-02 01:57:38 | INFO | fairseq_cli.train | done training in 55973.7 seconds
并且请问checkpoint_last.pt文件就是difformer.pt吗？
我下载了您提供的difformer.pt和transformer.pt文件运行却报错：
RuntimeError: Error(s) in loading state_dict for Difformer:
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).
size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).
Finished evaluate_step20_beam7x3. BLEU:
2023-07-02 12:54:26 | INFO | fairseq_cli.generate | loading model(s) from models/iwslt14_de_en/difformer.pt:models/iwslt14_de_en/transformer.pt
恕我愚钝，请问您能提供一些意见吗？

Answer 1 · 2023-07-09T11:08:40.000Z

Hi, thanks for your interest.

The transformer.pt should share the same vocabulary with the difformer.pt. So if you train a model using your own vocabulary, a new Transformer trained with the same vocabulary is required.

Answer 2 · 2023-12-04T06:58:50.000Z

想问一下，你最终找到哪一个是difformer.pt文件了吗

Answer 3 · 2023-12-15T12:38:07.000Z

想问一下，你最终找到哪一个是difformer.pt文件了吗

If you download our released checkpoints here, the difformer.pt can be find at the difformer_release/models/<dataset>/difformer.pt.
If you would like to train your own model, checkpoints are at models/<dataset>/<model name>/ckpt, among which checkpoint_last.pt is the checkpoint saved after the training finished, and checkpoint_best.pt is the best checkpoint according to evaluation bleu score.

Answer 4 · 2023-12-18T14:45:34.000Z

好的，谢谢您的回复

…

---- 回复的原邮件 ---- | 发件人 | Zhujin ***@***.***> | | 日期 | 2023年12月15日 20:38 | | 收件人 | ***@***.***> | | 抄送至 | ***@***.***>***@***.***> | | 主题 | Re: [zhjgao/difformer] ize mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]). (Issue #4) | 想问一下，你最终找到哪一个是difformer.pt文件了吗 If you download our released checkpoints here, the difformer.pt can be find at the difformer_release/models/<dataset>/difformer.pt. If you would like to train your own model, checkpoints are at models/<dataset>/<model name>/ckpt, among which checkpoint_last.pt is the checkpoint saved after the training finished, and checkpoint_best.pt is the best checkpoint according to evaluation bleu score. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>