Lightning-AI/litgpt

Continue pre-training got RuntimeError: Failed processing /tmp/data

Opened this issue · 4 comments

How can I solve this problem?

I download pythia-160m from hugging face.
My data was downloaded according to official documents.
This program is running in NVIDIA docker.

I followed the official documentation and continued pre-training, but an error occurred: RuntimeError: Failed processing /tmp/data.

litgpt pretrain
--model_name pythia-160m
--tokenizer_dir checkpoints/EleutherAI/pythia-160m
--initial_checkpoint_dir checkpoints/EleutherAI/pythia-160m
--data TextFiles
--data.train_data_path "custom_texts"
--out_dir out/custom_model

I got:
RuntimeError:
We found the following error Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 628, in _handle_data_chunk_recipe
for item_data in item_data_or_generator:
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/functions.py", line 151, in _prepare_item_generator
yield from self._fn(item_metadata) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/litgpt/data/text_files.py", line 124, in tokenize
with open(filename, "r", encoding="utf-8") as file:
IsADirectoryError: [Errno 21] Is a directory: '/tmp/data'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 423, in run
self._loop()
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 472, in _loop
self._handle_data_chunk_recipe(index)
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 638, in _handle_data_chunk_recipe
raise RuntimeError(f"Failed processing {self.items[index]}") from e
RuntimeError: Failed processing /tmp/data

Same issue as in #1402

cc @awaelchli

@carmocca Please tell me, if I want to continue using litgpt for pre-training, what should I do? Should we wait until the bug is fixed before using litgpt? thank you!

Are you using Google Colab? You could try using https://lightning.ai while this gets fixed. It should work there without issues

Thank you, I am currently using litgpt under our company's gpu cluster. I will wait for the issue to be fixed before continuing to use it.