Error Message "size mismatch for relation_k_emb.weight" when i'm trying load a training models using t5-small
kanseaveg opened this issue · 0 comments
kanseaveg commented
I am rasat running on two consumer-grade graphics cards.The pre-trained model I am implementing is t5-small.And successfully executed the following command: CUDA_VISIBLE_DEVICES="0,1" python3 -m torch.distributed.launch --nnodes=1 --nproc_per_node=2 seq2seq/run_seq2seq.py configs/spider/train_spider_rasat_small.json
tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
208210 ***** eval metrics *****
208211 epoch = 3071.95
208212 eval_exact_match = 0.5348
208213 eval_exec = 0.5387
208214 eval_loss = 0.7128
208215 eval_runtime = 0:02:24.19
208216 eval_samples = 1034
208217 eval_samples_per_second = 7.171
208218 100% 65/65 [02:22<00:00, 2.20s/it]<__array_function__ internals>:5: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different l
However, when I was evaluating, as mentioned in the train configuration file, I set the evaluation model path to "./experiment/train_spider_rasat_small",
.
I encountered an error when executing the evaluation command
python3 seq2seq/eval_run_seq2seq.py configs/spider/eval_spider_rasat_4160.json
The error message is:
Dataset name: spider
Mode: dev
Databases has been preprocessed. Use cache.
Dataset has been preprocessed. Use cache.
Dataset: spider
Mode: dev
Match Questions...
100%|█████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [00:01<00:00, 606.60it/s]Question match errors: 0/1034
Match Table, Columns, DB Contents...
1034it [00:01, 614.75it/s]
DB match errors: 0/1034
Generate Relations...
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 1034/1034 [00:10<00:00, 95.10it/s]Edge match errors: 0/2340638
06/28/2023 20:30:11 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at ./transformers_cache/spider/spider/1.0.0/a9000e8b37ea883ad113d628d95c9067385cc1105e2641a44bfa3090483dbb9b/cache-21e2b8bdcac7ddca.arrow
===================================================
Num of relations uesd in RASAT is : 45
===================================================
Use relation model.
./experiment/train_spider_rasat_small
Traceback (most recent call last):
File "seq2seq/eval_run_seq2seq.py", line 320, in <module>
main()
File "seq2seq/eval_run_seq2seq.py", line 208, in main
model = nn.DataParallel(model_cls_wrapper(T5ForConditionalGeneration).from_pretrained(
File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1453, in from_pretrained
model, missing_keys, unexpected_keys, mismatched_keys, error_msgs = cls._load_state_dict_into_model(
File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1607, in _load_state_dict_into_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
size mismatch for relation_k_emb.weight: copying a param with shape torch.Size([49, 64]) from checkpoint, the shape in current model is torch.Size([46, 64]).
size mismatch for relation_v_emb.weight: copying a param with shape torch.Size([49, 64]) from checkpoint, the shape in current model is torch.Size([46, 64]).
size mismatch for encoder.relation_k_emb.weight: copying a param with shape torch.Size([49, 64]) from checkpoint, the shape in current model is torch.Size([46, 64]).
size mismatch for encoder.relation_v_emb.weight: copying a param with shape torch.Size([49, 64]) from checkpoint, the shape in current model is torch.Size([46, 64]).
wandb: Waiting for W&B process to finish, PID 310089... (failed 1). Press ctrl-c to abort syncing.
Could you please check and see where the error occurred? Thank you.