OptimalScale/LMFlow

baichuan7B-2 trust_remote_code and then followed by other Error

lewislovelock opened this issue ยท 9 comments

according to https://github.com/OptimalScale/LMFlow/issues/520, i already added trust_remote_code =True to https://github.com/OptimalScale/LMFlow/blob/main/src/lmflow/models/hf_decoder_model.py

but then it occours following error:

Traceback (most recent call last):
  File "/root/LMFlow/examples/finetune.py", line 61, in <module>
    main()
  File "/root/LMFlow/examples/finetune.py", line 54, in main
    model = AutoModel.get_model(model_args)
  File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
    return HFDecoderModel(model_args, *args, **kwargs)
  File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 150, in __init__
    tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 678, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 157, in get_class_in_module
    shutil.copy(f"{module_dir}/{module_file_name}", tmp_dir)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/shutil.py", line 428, in copy
    copymode(src, dst, follow_symlinks=follow_symlinks)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/shutil.py", line 316, in copymode
    st = stat_func(src)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/shutil.py", line 229, in _stat
    return fn.stat() if isinstance(fn, os.DirEntry) else os.stat(fn)
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py'
Traceback (most recent call last):
  File "/root/LMFlow/examples/finetune.py", line 61, in <module>
    main()
  File "/root/LMFlow/examples/finetune.py", line 54, in main
    model = AutoModel.get_model(model_args)
  File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
    return HFDecoderModel(model_args, *args, **kwargs)
  File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 150, in __init__
    tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 678, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 179, in get_class_in_module
    return getattr(module, class_name)
AttributeError: module 'transformers_modules.tokenization_baichuan' has no attribute 'BaiChuanTokenizer'
Traceback (most recent call last):
  File "/root/LMFlow/examples/finetune.py", line 61, in <module>
    main()
  File "/root/LMFlow/examples/finetune.py", line 54, in main
    model = AutoModel.get_model(model_args)
  File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
    return HFDecoderModel(model_args, *args, **kwargs)
  File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 150, in __init__
    tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 678, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 179, in get_class_in_module
    return getattr(module, class_name)
AttributeError: module 'transformers_modules.tokenization_baichuan' has no attribute 'BaiChuanTokenizer'

my baichuan-7B-2 model is at my local machine. would you mind help me solve this, i would be appreciate it.

i want to lora fine tune baichuan-7b-2, by now, the error is:

[2023-10-08 03:04:21,218] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
10/08/2023 03:06:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 03:06:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 03:06:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 03:06:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 03:06:29 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 03:06:29 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 03:06:29 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 03:06:29 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py'
Traceback (most recent call last):
  File "/root/LMFlow/examples/finetune.py", line 61, in <module>
    main()
  File "/root/LMFlow/examples/finetune.py", line 54, in main
    model = AutoModel.get_model(model_args)
  File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
    return HFDecoderModel(model_args, *args, **kwargs)
  File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 196, in __init__
    config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 923, in from_pretrained
    config_class = get_class_from_dynamic_module(
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 177, in get_class_in_module
    module = importlib.import_module(module_path)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.configuration_baichuan'
[2023-10-08 03:06:31,449] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 33817
[2023-10-08 03:06:32,464] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 33818
[2023-10-08 03:06:33,044] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 33819
[2023-10-08 03:06:33,044] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 33820
[2023-10-08 03:06:33,217] [ERROR] [launch.py:324:sigkill_handler] ['/root/miniconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=3', '--model_name_or_path', '/data/dev/zhang/models/Transformers/baichuan-7B-2/', '--dataset_path', '/data/dev/liu/Data/chat/', '--output_dir', 'output_models/baichuan-7B-Lora-chat', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '1e-4', '--block_size', '512', '--per_device_train_batch_size', '1', '--use_lora', '1', '--lora_r', '8', '--save_aggregated_lora', '0', '--deepspeed', 'configs/ds_config_zero2.json', '--fp16', '--run_name', 'finetune_with_lora', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

but as you can see:

(lmflow) root@lmflow1:~/LMFlow# ll /root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py
-rw-r--r-- 1 root root 9574 Oct  8 03:06 /root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py

i have the file.

Thanks for your interest in LMFlow! It could be caused by huggingface version problems. Huggingface has gone through a major upgrade related to the model file formats, and the new formats are not supported by old versions, i.e. no forward compatibility. If that's the case, you can use the main branch of LMFlow to see if that problem still occurs.

If that doesn't solve your issue, please feel free to contact us again. Thanks ๐Ÿ™

Thanks for your interest in LMFlow! It could be caused by huggingface version problems. Huggingface has gone through a major upgrade related to the model file formats, and the new formats are not supported by old versions, i.e. no forward compatibility. If that's the case, you can use the main branch of LMFlow to see if that problem still occurs.

If that doesn't solve your issue, please feel free to contact us again. Thanks ๐Ÿ™

thanks for reply, I am already in the main branch of LMFlow which commit id is c530a6f28de94f3b83a2a4b4ff4dbc96529c0503, so the issue maybe not solved.

Would you mind checking the transformers version pip show transformers, also trying to read from the local model to see if the problem still occurs? Thanks!

./scripts/run_finetune.sh \
  --model_name_or_path /local-path-to-model/ \
  --dataset_path data/alpaca/train \
  --output_model_path output_models/finetuned_baichuan-7b

Would you mind checking the transformers version pip show transformers, also trying to read from the local model to see if the problem still occurs? Thanks!

./scripts/run_finetune.sh \
  --model_name_or_path /local-path-to-model/ \
  --dataset_path data/alpaca/train \
  --output_model_path output_models/finetuned_baichuan-7b

As your patiently recommend,
i ran pip show transformers, it shows:

Version: 4.28.0.dev0
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /root/miniconda3/envs/lmflow/lib/python3.9/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, tokenizers, tqdm
Required-by: lm-eval, lmflow, peft, trl

i ran command line :

./scripts/run_finetune_with_lora.sh   --model_name_or_path /data/dev/zhang/models/Transformers/baichuan-7B-2/   --dataset_path /data/dev/liu/Data/train/ --output_lora_path output_models/finetuned_baichuan

it still shows:

[2023-10-08 05:33:31,633] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-10-08 05:33:31,661] [INFO] [runner.py:550:main] cmd = /root/miniconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path /data/dev/zhang/models/Transformers/baichuan-7B-2/ --dataset_path /data/dev/liu/Data/train/ --output_dir output_models/finetuned_baichuan --overwrite_output_dir --num_train_epochs 1 --learning_rate 1e-4 --block_size 512 --per_device_train_batch_size 1 --use_lora 1 --lora_r 8 --save_aggregated_lora 0 --deepspeed configs/ds_config_zero2.json --fp16 --run_name finetune_with_lora --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.13.4-1+cuda11.7
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.13.4-1
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NCCL_VERSION=2.13.4-1
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.13.4-1+cuda11.7
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.13.4-1
[2023-10-08 05:33:33,123] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2023-10-08 05:33:33,123] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=4, node_rank=0
[2023-10-08 05:33:33,123] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2023-10-08 05:33:33,123] [INFO] [launch.py:162:main] dist_world_size=4
[2023-10-08 05:33:33,123] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
[2023-10-08 05:33:37,169] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
10/08/2023 05:35:45 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 05:35:45 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 05:35:45 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 05:35:45 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 05:35:46 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 05:35:46 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 05:35:46 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 05:35:46 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py'
Traceback (most recent call last):
  File "/root/LMFlow/examples/finetune.py", line 61, in <module>
    main()
  File "/root/LMFlow/examples/finetune.py", line 54, in main
    model = AutoModel.get_model(model_args)
  File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
    return HFDecoderModel(model_args, *args, **kwargs)
  File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 196, in __init__
    config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 923, in from_pretrained
    config_class = get_class_from_dynamic_module(
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 177, in get_class_in_module
    module = importlib.import_module(module_path)
  File "/root/miniconda3/envs/lmflow/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.configuration_baichuan'

and my baichuan model directory is like this:

total 18280792
drwxr-xr-x  2 root root         513 Oct  8 02:39  ./
drwxr-xr-x 25 2002 2000         755 Sep 12 01:11  ../
-rw-r--r--  1 root root       13122 Jun 19 03:53  README.md
-rw-r--r--  1 root root      774879 Jun 19 03:53 'baichuan-7B '$'\346\250\241\345\236\213\350\256\270\345\217\257\345\215\217\350\256\256''.pdf'
-rw-r--r--  1 root root         752 Jul 13 02:34  config.json
-rw-r--r--  1 root root        2345 Jun 19 03:53  configuration_baichuan.py
-rw-r--r--  1 root root         132 Jun 19 03:53  generation_config.json
-rw-r--r--  1 root root        1477 Jun 19 03:53  gitattributes.txt
-rw-r--r--  1 root root        1052 Jun 19 03:53  handler.py
-rw-r--r--  1 root root       33128 Jul 13 02:31  modeling_baichuan.py
-rw-r--r--  1 root root 14001182896 Jun 19 03:53  pytorch_model.bin
-rw-r--r--  1 root root         411 Jun 19 03:53  special_tokens_map.json
-rw-r--r--  1 root root        9574 Jun 19 03:53  tokenization_baichuan.py
-rw-r--r--  1 root root     1136699 Jun 19 03:53  tokenizer.model
-rw-r--r--  1 root root         802 Jun 19 03:53  tokenizer_config.json

the isssue seems like still exists. Thanks for your patient!๐Ÿ™

i found the requirements that transformers>=4.31.0, so my previous docker image maybe too old, so maybe i should update transformers, i'll let you know if that problem still occurs, thanks! ๐Ÿ™

You're welcome ๐Ÿ˜„ Please feel free to contact us if you encounter any further issues

after use the latest code which commit id is c530a6f28de94f3b83a2a4b4ff4dbc96529c0503, and i reinstalled my env by pip install -r requirements.txt, now i am able to fine tune baichuan7b-2๐Ÿ˜„, although fine tune baichuan7b-2 use lora is not supported.

anyway, thanks a lot! ๐Ÿ™

UPDATE, if you want to fine tune baichuan-2 use LoRA, just add --lora_target_modules W_pack in the scripts/run_finetune_with_lora.sh would be OK! ๐Ÿค—