Lightning-AI/lit-llama

No response after training an epoch

Dylandtt opened this issue · 2 comments

-iter 1993: loss 0.7506, time: 483.78ms
iter 1994: loss 0.9028, time: 339.06ms
iter 1995: loss 0.9767, time: 521.26ms
iter 1996: loss 0.8616, time: 419.42ms
iter 1997: loss 0.7878, time: 480.91ms
iter 1998: loss 0.6554, time: 407.63ms
Saving adapter weights to out/adapter_v2/alpaca
Saving adapter weights to out/adapter_v2/alpaca
Saving adapter weights to out/adapter_v2/alpaca
Saving adapter weights to out/adapter_v2/alpaca
/root/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/root/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/root/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/root/anaconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(

you need to control eval_interval and gradient_accumulation_iters variables to show the val results, or you can change the code to evaluate the model at the end of training by copying what's under if step_count % eval_interval == 0 at the end of train function

After training an epoch, it doesn't continue to train and doesn't stop, it stays stuck on this screen