OptimalScale/LMFlow

[BUG]Loading a dataset cached in a LocalFileSystem is not supported

xiaohangguo opened this issue · 2 comments

最近回报一个莫名其妙的数据类型不支持的错误,我看了一下,是datasets版本的问题。

10/29/2023 11:23:38 - WARNING - datasets.builder - Found cached dataset json (file:///public/home/lvshuhang/.cache/huggingface/datasets/json/default-01eed702bb47992a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
  File "/public/home/lvshuhang/LMFlow-main/examples/finetune.py", line 61, in <module>
    main()
  File "/public/home/lvshuhang/LMFlow-main/examples/finetune.py", line 53, in main
    dataset = Dataset(data_args)
  File "/public/home/lvshuhang/LMFlow-main/src/lmflow/datasets/dataset.py", line 104, in __init__
    raw_dataset = load_dataset(
  File "/public/home/lvshuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/load.py", line 1794, in load_dataset
    ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
  File "/public/home/lvshuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/builder.py", line 1089, in as_dataset
    raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.
10/29/2023 11:23:40 - WARNING - datasets.builder - Found cached dataset json (file:///public/home/lvshuhang/.cache/huggingface/datasets/json/default-01eed702bb47992a/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
  File "/public/home/lvshuhang/LMFlow-main/examples/finetune.py", line 61, in <module>
    main()
  File "/public/home/lvshuhang/LMFlow-main/examples/finetune.py", line 53, in main
    dataset = Dataset(data_args)
  File "/public/home/lvshuhang/LMFlow-main/src/lmflow/datasets/dataset.py", line 104, in __init__
    raw_dataset = load_dataset(
  File "/public/home/lvshuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/load.py", line 1794, in load_dataset
    ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
  File "/public/home/lvshuhang/miniconda3/envs/lmflow/lib/python3.9/site-packages/datasets/builder.py", line 1089, in as_dataset
    raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")

类似问题:huggingface/datasets#6352

解决方案:

pip install -U datasets

如果可以的话,希望lmlfow更新一下requirements.txt

Thanks for your interest in LMFlow! We will update it soon after testing. Also, if you are interested in contributing via PR, we welcome all kinds of contributions to help us together improve the repository. Thanks! 😄

datasets has been updated to 2.14.6 ,haha.