Morizeyao/GPT2-Chinese

Input tensor at index 2 has invalid shape [2, 2, 12, 1024, 64], but expected [2, 3, 12, 1024, 64]

ZouRuia opened this issue · 0 comments

我用三块卡训练得时候会出现这个错,然后我去查了一圈,发现有一个四块卡报RuntimeError: Input tensor at index 3 has invalid shape [2, 2, 16, 128, 64] but expected [2, 4, 16, 128, 64]的,然后我就又改回了四块卡训练,然后就很奇怪的跑通了。。但是不知道为什么。。
args:
Namespace(batch_size=8, device='5,6,1,4', epochs=5, fp16=False, fp16_opt_level='O1', gradient_accumulation=1, log_step=1, lr=0.00015, max_grad_norm=1.0, model_config='config/model_config_small.json', num_pieces=100, output_dir='model/', pretrained_model='', raw=False, raw_data_path='data/data/doupo/train.json', segment=False, stride=768, tokenized_data_path='data/tokenized/', tokenizer_path='cache/vocab_small.txt', warmup_steps=2000)
config:
{
"attn_pdrop": 0.1,
"embd_pdrop": 0.1,
"finetuning_task": null,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_layer": 10,
"n_positions": 1024,
"num_labels": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_past": true,
"pruned_heads": {},
"resid_pdrop": 0.1,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"torchscript": false,
"use_bfloat16": false,
"vocab_size": 13317
}

using device: cuda
calculating total steps
100%|████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 92.82it/s]
total steps = 3914
Let's use 4 GPUs!
starting training
epoch 1
time: 2023-01-13 11:48:51.538218
/u01/zourui/anaconda3/envs/GPT/lib/python3.8/site-packages/torch/nn/parallel/functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
/u01/zourui/anaconda3/envs/GPT/lib/python3.8/site-packages/transformers/optimization.py:166: UserWarning: This overload of add
is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:1005.)
exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
now time: 11:49. Step 1 of piece 0 of epoch 1, loss 9.667740821838379
now time: 11:49. Step 2 of piece 0 of epoch 1, loss 9.682665824890137
now time: 11:49. Step 3 of piece 0 of epoch 1, loss 9.685418128967285
now time: 11:49. Step 4 of piece 0 of epoch 1, loss 9.6702299118042
now time: 11:49. Step 5 of piece 0 of epoch 1, loss 9.668827056884766
now time: 11:49. Step 6 of piece 0 of epoch 1, loss 9.66973876953125
now time: 11:49. Step 7 of piece 0 of epoch 1, loss 9.65914535522461