llSourcell/Doctor-Dignity

RuntimeError: shape '[-1, 32]' is invalid for input of size 1

Opened this issue · 0 comments

The modeling_llama.py code is too buggy. I am debugging a common pattern of error. the input_ids has one additional dimension added in front. Then many other places in the code do not know this and get the wrong dimension information.

Traceback (most recent call last):
  File "/global/cfs/cdirs/m2956/workspace-cfs/openmp-qa/reinforcement_learning.py", line 140, in <module>
    response_tensor = ppo_trainer.generate(query_tensor, pad_token_id=tokenizer.eos_token_id, max_new_tokens=20)
  File "/global/homes/l/liaoch/.local/unknown/pytorch1.13.1/lib/python3.9/site-packages/trl/trainer/ppo_trainer.py", line 454, in generate
    response = self.accelerator.unwrap_model(self.model).generate(
  File "/global/homes/l/liaoch/.local/unknown/pytorch1.13.1/lib/python3.9/site-packages/trl/models/modeling_value_head.py", line 198, in generate
    return self.pretrained_model.generate(*args, **kwargs)
  File "/global/common/software/nersc/pm-2022q4/sw/pytorch/1.13.1/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/global/homes/l/liaoch/.local/unknown/pytorch1.13.1/lib/python3.9/site-packages/transformers/generation/utils.py", line 1538, in generate
    return self.greedy_search(
  File "/global/homes/l/liaoch/.local/unknown/pytorch1.13.1/lib/python3.9/site-packages/transformers/generation/utils.py", line 2362, in greedy_search
    outputs = self(
  File "/global/common/software/nersc/pm-2022q4/sw/pytorch/1.13.1/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/global/homes/l/liaoch/.local/unknown/pytorch1.13.1/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
    outputs = self.model(
  File "/global/common/software/nersc/pm-2022q4/sw/pytorch/1.13.1/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/global/homes/l/liaoch/.local/unknown/pytorch1.13.1/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 643, in forward
    position_ids = position_ids.view(-1, seq_length).long()
RuntimeError: shape '[-1, 32]' is invalid for input of size 1

the size of position_ids should be the same as the input_ids's last dimension (1x1x32 ).
But it has 1x1 shape.

(Pdb) p position_ids

tensor([[0]], device='cuda:0')

def prepare_inputs_for_generation () sets position_ids, based on the shape of attention_mask, which in turn is set by _prepare_attention_mask_for_generation () from .. pytorch1.13.1/lib/python3.9/site-packages/transformers/generation/utils.py

return torch.ones(inputs.shape[:2], dtype=torch.long, device=inputs.device)

I changed it to inputs.shape[1:3] instead the code can proceed.

But it then get another similar error later.

  File ".local/unknown/pytorch1.13.1/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 287, in forward
    bsz, q_len, _ = hidden_states.size()
ValueError: too many values to unpack (expected 3)
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> ...local/unknown/pytorch1.13.1/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py(287)forward()
-> bsz, q_len, _ = hidden_states.size()
(Pdb) p hidden_states.size()
torch.Size([1, 1, 32, 4096])