YuxinWenRick/tree-ring-watermark

Dataset column names don't match

ZCG335 opened this issue · 0 comments

Dear author, I met a problem when testing the code. Could you please help me solve it? thank you very much!
Generating train split: 0%| | 0/54568 [00:00<?, ? examples/s]Failed to read file '/root/.cache/huggingface/datasets/downloads/870be28b4b7b1b74063e420fac9fb246f3159ab8b9a4a390c4f2b2e08e92eea9' with error <class 'datasets.table.CastError'>: Couldn't cast
Prompt: string
-- schema metadata --
pandas: '{"index_columns": [], "column_indexes": [], "columns": [{"name":' + 186
to
{'instruction': Value(dtype='string', id=None), 'output': Value(dtype='string', id=None), 'input': Value(dtype='string', id=None)}
because column names don't match
Generating train split: 0%| | 0/54568 [00:00<?, ? examples/s]
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1997, in _prepare_split_single
for _, table in generator:
File "/root/miniconda3/lib/python3.8/site-packages/datasets/packaged_modules/parquet/parquet.py", line 93, in generate_tables
yield f"{file_idx}
{batch_idx}", self._cast_table(pa_table)
File "/root/miniconda3/lib/python3.8/site-packages/datasets/packaged_modules/parquet/parquet.py", line 71, in _cast_table
pa_table = table_cast(pa_table, self.info.features.arrow_schema)
File "/root/miniconda3/lib/python3.8/site-packages/datasets/table.py", line 2302, in table_cast
return cast_table_to_schema(table, schema)
File "/root/miniconda3/lib/python3.8/site-packages/datasets/table.py", line 2256, in cast_table_to_schema
raise CastError(
datasets.table.CastError: Couldn't cast
Prompt: string
-- schema metadata --
pandas: '{"index_columns": [], "column_indexes": [], "columns": [{"name":' + 186
to
{'instruction': Value(dtype='string', id=None), 'output': Value(dtype='string', id=None), 'input': Value(dtype='string', id=None)}
because column names don't match

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "run_tree_ring_watermark.py", line 217, in
main(args)
File "run_tree_ring_watermark.py", line 42, in main
dataset, prompt_key = get_dataset(args)
File "/root/tree-ring-watermark-main/optim_utils.py", line 109, in get_dataset
dataset = load_dataset(args.dataset)['test']
File "/root/miniconda3/lib/python3.8/site-packages/datasets/load.py", line 2616, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1029, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1124, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 1884, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/root/miniconda3/lib/python3.8/site-packages/datasets/builder.py", line 2040, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset