datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}
Opened this issue · 1 comments
SDcodehub commented
╰─$ python llama.py /datadrive/models/Llama-2-13b-chat-hf c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors /datadrive/models/Llama-2-13b-chat-hf-gptq/llama-2-13b-4bit-gs128.safetensors
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.41it/s]
/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:389: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:394: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
Downloading and preparing dataset None/en to file:///home/FRACTAL/sagar.desai/.cache/huggingface/datasets/allenai___json/en-ec45c889631c3c39/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6413.31it/s]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1855.89it/s]
Traceback (most recent call last):
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/llama.py", line 488, in <module>
dataloader, testloader = get_loaders(args.dataset, nsamples=args.nsamples, seed=args.seed, model=args.model, seqlen=model.seqlen)
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/utils/datautils.py", line 189, in get_loaders
return get_c4(nsamples, seed, seqlen, model)
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/utils/datautils.py", line 64, in get_c4
traindata = load_dataset('allenai/c4', 'allenai--c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train', use_auth_token=False)
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/load.py", line 1797, in load_dataset
builder_instance.download_and_prepare(
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/builder.py", line 890, in download_and_prepare
self._download_and_prepare(
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/builder.py", line 1003, in _download_and_prepare
verify_splits([self.info](http://self.info/).splits, split_dict)
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/utils/info_utils.py", line 91, in verify_splits
raise ExpectedMoreSplits(str(set(expected_splits) - set(recorded_splits)))
datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}
working on A100.
tried with different datasets version from 2.10.* to 2.12.*
getting same error
iibw commented
This error seems to have happened because c4 was updated with some datasets
configuration options which aren't supported in older versions of datasets
.
To fix, upgrade datasets
with pip install -U datasets
and remove , 'allenai--c4'
from all four c4 load_dataset
lines in GPTQ-for-LLaMa/utils/datautils.py
.
Some additional info here https://huggingface.co/datasets/allenai/c4/discussions/7