Lightning-AI/lit-llama

pre-train with single machine with multi gpus

willard-yuan opened this issue · 3 comments

Following the train_redpajama.md, I try to pretrain single machine with multi gpus. Then I do the following:

python pretrain/redpajama.py --devices 4 --train_data_dir data/lit-redpajama-sample

I got the following error:

 File "/yuanshitestvepfs/lit-llama/pretrain/redpajama.py", line 144, in main
    for iter_num, train_data in enumerate(train_dataloader):
RuntimeError: generator raised StopIteration

Is there any thing I missed?

try reducing the number of iterations.