Lightning-AI/litgpt

Finetune lora max_seq_length error

SergioG-M opened this issue · 4 comments

I am getting an error when running litgpt finetune_lora

At the beginning of training the max_seq_length is set to 466 because that is the longest sequence in my training set

"The longest sequence length in the train data is 466, the model's maximum sequence length is 466 and context length is 2048"

However, when the training is finished and a final validation is performed in

val_loss = validate(fabric, model, val_dataloader, dataclasses.replace(eval, max_iters=len(val_dataloader)))
I get an error
"Cannot forward sequence of length 473, max seq length is only 466"

There is a at least a sample in the validation set that is longer than the longest one in the training set Does anyone know how to fix this?

This is the traceback I get

File "/usr/local/lib/python3.10/dist-packages/litgpt/finetune/lora.py", line 215, in main
val_loss = validate(fabric, model, val_dataloader, dataclasses.replace(eval, max_iters=len(val_dataloader)))
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/litgpt/finetune/lora.py", line 353, in validate
logits = model(input_ids)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lightning/fabric/wrappers.py", line 139, in forward
output = self._forward_module(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/litgpt/lora.py", line 527, in forward
raise ValueError(f"Cannot forward sequence of length {T}, max seq length is only {self.max_seq_length}.")
ValueError: Cannot forward sequence of length 473, max seq length is only 466.

Thanks for sharing. Yeah, this shouldn't happen, and the max sequence length calculation should happen on both the training and validation data not just the training data. Will have to look into this and update.

In the meantime, you could rerun the training with --train.max_seq_length 512 or so to make sure this doesn't happen in your case.

Thanks for sharing. Yeah, this shouldn't happen, and the max sequence length calculation should happen on both the training and validation data not just the training data. Will have to look into this and update.

In the meantime, you could rerun the training with --train.max_seq_length 512 or so to make sure this doesn't happen in your case.

Thanks!

Actually, I think that train.max_seq_length is not enough, the problem comes from

model.max_seq_length = min(longest_seq_length, train.max_seq_length or float("inf"))

So I just changed that in my case

Thanks, fixing it in #1462

Should be fixed now.