
Continue pre-training got RuntimeError: Failed processing /tmp/data

Opened this issue · 4 comments

How can I solve this problem?

I download pythia-160m from hugging face.
My data was downloaded according to official documents.
This program is running in NVIDIA docker.

I followed the official documentation and continued pre-training, but an error occurred: RuntimeError: Failed processing /tmp/data.

litgpt pretrain
--model_name pythia-160m
--tokenizer_dir checkpoints/EleutherAI/pythia-160m
--initial_checkpoint_dir checkpoints/EleutherAI/pythia-160m
--data TextFiles
--data.train_data_path "custom_texts"
--out_dir out/custom_model

I got:
We found the following error Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/", line 628, in _handle_data_chunk_recipe
for item_data in item_data_or_generator:
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/", line 151, in _prepare_item_generator
yield from self._fn(item_metadata) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/litgpt/data/", line 124, in tokenize
with open(filename, "r", encoding="utf-8") as file:
IsADirectoryError: [Errno 21] Is a directory: '/tmp/data'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/", line 423, in run
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/", line 472, in _loop
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/", line 638, in _handle_data_chunk_recipe
raise RuntimeError(f"Failed processing {self.items[index]}") from e
RuntimeError: Failed processing /tmp/data

Same issue as in #1402

cc @awaelchli

@carmocca Please tell me, if I want to continue using litgpt for pre-training, what should I do? Should we wait until the bug is fixed before using litgpt? thank you!

Are you using Google Colab? You could try using while this gets fixed. It should work there without issues

Thank you, I am currently using litgpt under our company's gpu cluster. I will wait for the issue to be fixed before continuing to use it.