Enough samples error: Make sure that your dataset has enough samples to at least yield one packed sequence.
Closed this issue · 2 comments
I am just doing a test training - with a small csv file of only 10 entries.
Tried to resolve by adding to params:
Packing: False
Padding: Left
Also setting train_split: null in yaml.config
and adding max sequence = 128
Error:
ERROR | 2024-09-17 12:54:20 | autotrain.trainers.common:wrapper:120 - train has failed due to an exception: Traceback (most recent call last):
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\builder.py", line 1775, in _prepare_split_single
num_examples, num_bytes = writer.finalize()
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\arrow_writer.py", line 611, in finalize
raise SchemaInferenceError("Please pass features
or at least one example when writing data")
datasets.arrow_writer.SchemaInferenceError: Please pass features
or at least one example when writing data
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\sharm\anaconda3\lib\site-packages\trl\trainer\sft_trainer.py", line 642, in _prepare_packed_dataloader
packed_dataset = Dataset.from_generator(
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\arrow_dataset.py", line 1117, in from_generator
return GeneratorDatasetInputStream(
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\io\generator.py", line 47, in read
self.builder.download_and_prepare(
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\builder.py", line 1027, in download_and_prepare
self._download_and_prepare(
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\builder.py", line 1789, in _download_and_prepare
super()._download_and_prepare(
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\builder.py", line 1122, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\builder.py", line 1627, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "C:\Users\sharm\anaconda3\lib\site-packages\datasets\builder.py", line 1784, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\sharm\anaconda3\lib\site-packages\autotrain\trainers\common.py", line 117, in wrapper
return func(*args, **kwargs)
File "C:\Users\sharm\anaconda3\lib\site-packages\autotrain\trainers\clm_main_.py", line 28, in train
train_sft(config)
File "C:\Users\sharm\anaconda3\lib\site-packages\autotrain\trainers\clm\train_clm_sft.py", line 46, in train
trainer = SFTTrainer(
File "C:\Users\sharm\anaconda3\lib\site-packages\huggingface_hub\utils_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
File "C:\Users\sharm\anaconda3\lib\site-packages\trl\trainer\sft_trainer.py", line 372, in init
train_dataset = self._prepare_dataset(
File "C:\Users\sharm\anaconda3\lib\site-packages\trl\trainer\sft_trainer.py", line 534, in _prepare_dataset
return self._prepare_packed_dataloader(
File "C:\Users\sharm\anaconda3\lib\site-packages\trl\trainer\sft_trainer.py", line 646, in _prepare_packed_dataloader
raise ValueError(
ValueError: Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.
ERROR | 2024-09-17 12:54:20 | autotrain.trainers.common:wrapper:121 - Error occurred while packing the dataset. Make sure that your dataset has enough samples to at least yield one packed sequence.
INFO | 2024-09-17 12:54:21 | autotrain.parser:run:217 - Job ID: 12572
data.csv contains email subject lines in text column, and complete email text in the next column.
Reducing the block_size to 128 or 64 does the trick.