Crashed on labml_nn/neox/samples/finetune.py

Question

Crashed on labml_nn/neox/samples/finetune.py

Keith-Hon opened this issue a year ago · 2 comments

I tried to run all .py files inside the samples folder. The generate.py and llm_int8.py files worked fine, however, the finetune.py crashed

https://app.labml.ai/run/b97204eaa95611eda6ae9bc880f62bb5

with error:

Traceback (most recent call last):
File "/home/paperspace/Desktop/playground/neox-20b/notebooks/finetune.py", line 128, in main()
File "/home/paperspace/Desktop/playground/neox-20b/notebooks/finetune.py", line 121, in main
conf.train_epoch()
File "/home/paperspace/.local/lib/python3.9/site-packages/labml_nn/neox/utils/trainer.py", line 116, in train_epoch
loss, output, target = self.get_loss(sample, split_name)
File "/home/paperspace/.local/lib/python3.9/site-packages/labml_nn/neox/utils/trainer.py", line 64, in get_loss
data, target = sample
TypeError: cannot unpack non-iterable NoneType object

I know it's using the Tiny Shakespeare dataset to do the finetuning, but i have no idea why it crashed. Also, I would like to know how to use a custom dataset to fine tune.

Any help?

Answer 1 · 2023-02-10T15:40:51.000Z

I was running with a single 80GB ram gpu instead of 2x 48GB ram gpus, was it the problem?

Answer 2 · 2023-06-30T10:03:19.000Z

Very sorry about the late reply. Finetune uses pipeline parallel, but this error doesn't seem like it was because of that. Did it train for some or did it crash on the start? We had to take down app.labml due to server costs so I cant see the experiment details.