ize mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).
Closed this issue · 4 comments
你好,我训练epoch 608时,就训练结束了:
2023-07-02 01:57:38 | INFO | fairseq_cli.train | end of epoch 608 (average epoch stats below)
2023-07-02 01:57:38 | INFO | train | {"epoch": 608, "train_loss": "3.663", "train_nll_loss": "1.275", "train_diffusion": "0.424", "train_word_ins": "3.224", "train_length": "0.151", "train_ppl": "12.67", "train_bleu": "0", "train_wps": "23834.3", "train_ups": "3.3", "train_wpb": "7233.2", "train_bsz": "309.9", "train_num_updates": "300000", "train_lr": "9.12871e-05", "train_gnorm": "2.611", "train_clip": "97.3", "train_loss_scale": "16384", "train_train_wall": "25", "train_wall": "55974"}
2023-07-02 01:57:38 | INFO | fairseq_cli.train | done training in 55973.7 seconds
并且请问checkpoint_last.pt文件就是difformer.pt吗?
我下载了您提供的difformer.pt和transformer.pt文件运行却报错:
RuntimeError: Error(s) in loading state_dict for Difformer:
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).
size mismatch for decoder.output_projection.weight: copying a param with shape torch.Size([10096, 128]) from checkpoint, the shape in current model is torch.Size([151000, 128]).
Finished evaluate_step20_beam7x3. BLEU:
2023-07-02 12:54:26 | INFO | fairseq_cli.generate | loading model(s) from models/iwslt14_de_en/difformer.pt:models/iwslt14_de_en/transformer.pt
恕我愚钝,请问您能提供一些意见吗?
Hi, thanks for your interest.
The transformer.pt
should share the same vocabulary with the difformer.pt
. So if you train a model using your own vocabulary, a new Transformer trained with the same vocabulary is required.
想问一下,你最终找到哪一个是difformer.pt文件了吗
想问一下,你最终找到哪一个是difformer.pt文件了吗
- If you download our released checkpoints here, the
difformer.pt
can be find at thedifformer_release/models/<dataset>/difformer.pt
. - If you would like to train your own model, checkpoints are at
models/<dataset>/<model name>/ckpt
, among whichcheckpoint_last.pt
is the checkpoint saved after the training finished, andcheckpoint_best.pt
is the best checkpoint according to evaluation bleu score.