LUMIA-Group/rasat

Size do not match error when training on the spider dataset

Zoeyyao27 opened this issue · 1 comments

When I ran CUDA_VISIBLE_DEVICES="0" python3 seq2seq/run_seq2seq.py configs/spider/train_spider_rasat_small.json
I get the following error:
Traceback (most recent call last): File "/home/yaoy/convertsql2tree/RASAT/seq2seq/run_seq2seq.py", line 292, in <module> main() File "/home/yaoy/convertsql2tree/RASAT/seq2seq/run_seq2seq.py", line 237, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/transformers/trainer.py", line 1325, in train tr_loss_step = self.training_step(model, inputs) File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/transformers/trainer.py", line 1884, in training_step loss = self.compute_loss(model, inputs) File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/transformers/trainer.py", line 1916, in compute_loss outputs = model(**inputs) File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yaoy/convertsql2tree/RASAT/seq2seq/model/t5_relation_model.py", line 1742, in forward encoder_outputs = self.encoder( File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yaoy/convertsql2tree/RASAT/seq2seq/model/t5_relation_model.py", line 1135, in forward layer_outputs = checkpoint( File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 177, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 75, in forward outputs = run_function(*args) File "/home/yaoy/convertsql2tree/RASAT/seq2seq/model/t5_relation_model.py", line 1131, in custom_forward return tuple(module(*inputs, use_cache, output_attentions)) File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yaoy/convertsql2tree/RASAT/seq2seq/model/t5_relation_model.py", line 755, in forward self_attention_outputs = self.layer[0]( File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yaoy/convertsql2tree/RASAT/seq2seq/model/t5_relation_model.py", line 658, in forward attention_output = self.SelfAttention( File "/home/yaoy/miniconda3/envs/rasat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/yaoy/convertsql2tree/RASAT/seq2seq/model/t5_relation_model.py", line 587, in forward scores = relative_attention_logits(query_states, key_states, relation_k_states) # [batch, heads, num queries, num kvs] File "/home/yaoy/convertsql2tree/RASAT/seq2seq/model/t5_relation_model.py", line 512, in relative_attention_logits q_tr_t_matmul = torch.matmul(q_t, r_t) RuntimeError: The size of tensor a (472) must match the size of tensor b (468) at non-singleton dimension 1
I try to print the size of the q_t and r_t, and get the following result:
q_t shape: torch.Size([8, 472, 8, 64])
r_t shape: torch.Size([8, 468, 64, 468])

I suppose the second dimension should all be 512? Does anyone have any idea of what went wrong and how to modify it?

are you starting this training from an existing model checkpoint or from scratch?