aws/amazon-sagemaker-examples

[Bug Report] RuntimeError when running instruction fine-tuning on mistral 7b, Sagemaker Jumpstart

louishourcade opened this issue · 2 comments

Link to the notebook
https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/mistral-7b-instruction-domain-adaptation-finetuning.ipynb

Describe the bug
I get an error when I run the training step for instruction fine-tuning in this notebook. The training job starts properly, but after ~10min it fails and raises: ErrorMessage "raise RuntimeError( RuntimeError: Could not find response key [1, 32002] in token IDs tensor([ 1, 20811, 349, ..., 302, 15637, 266])

To reproduce

  • Upload the notebook in a Sagemaker Notebook
  • Run every cell, the error appears when running the instruction-fine tuning training job (1.3 Starting Training section)

Logs
Attaching some screenshots of the logs

Screenshot 2024-05-03 at 16 45 43

Screenshot 2024-05-03 at 16 47 10

Any idea on how to fix this ?

@louishourcade: Facing same issue while running the example notebook from AWS. Did you find the solution?

Hi @prakash5801, no I didn't find time to investigate more. But I saw yesterday that the error is still there