baichuan7B-2 trust_remote_code and then followed by other Error
lewislovelock opened this issue ยท 9 comments
according to https://github.com/OptimalScale/LMFlow/issues/520, i already added trust_remote_code =True
to https://github.com/OptimalScale/LMFlow/blob/main/src/lmflow/models/hf_decoder_model.py
but then it occours following error:
Traceback (most recent call last):
File "/root/LMFlow/examples/finetune.py", line 61, in <module>
main()
File "/root/LMFlow/examples/finetune.py", line 54, in main
model = AutoModel.get_model(model_args)
File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 150, in __init__
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 678, in from_pretrained
tokenizer_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 157, in get_class_in_module
shutil.copy(f"{module_dir}/{module_file_name}", tmp_dir)
File "/root/miniconda3/envs/lmflow/lib/python3.9/shutil.py", line 428, in copy
copymode(src, dst, follow_symlinks=follow_symlinks)
File "/root/miniconda3/envs/lmflow/lib/python3.9/shutil.py", line 316, in copymode
st = stat_func(src)
File "/root/miniconda3/envs/lmflow/lib/python3.9/shutil.py", line 229, in _stat
return fn.stat() if isinstance(fn, os.DirEntry) else os.stat(fn)
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py'
Traceback (most recent call last):
File "/root/LMFlow/examples/finetune.py", line 61, in <module>
main()
File "/root/LMFlow/examples/finetune.py", line 54, in main
model = AutoModel.get_model(model_args)
File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 150, in __init__
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 678, in from_pretrained
tokenizer_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 179, in get_class_in_module
return getattr(module, class_name)
AttributeError: module 'transformers_modules.tokenization_baichuan' has no attribute 'BaiChuanTokenizer'
Traceback (most recent call last):
File "/root/LMFlow/examples/finetune.py", line 61, in <module>
main()
File "/root/LMFlow/examples/finetune.py", line 54, in main
model = AutoModel.get_model(model_args)
File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 150, in __init__
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path, **tokenizer_kwargs)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 678, in from_pretrained
tokenizer_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 179, in get_class_in_module
return getattr(module, class_name)
AttributeError: module 'transformers_modules.tokenization_baichuan' has no attribute 'BaiChuanTokenizer'
my baichuan-7B-2 model is at my local machine. would you mind help me solve this, i would be appreciate it.
i want to lora fine tune baichuan-7b-2, by now, the error is:
[2023-10-08 03:04:21,218] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
10/08/2023 03:06:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 03:06:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 03:06:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 03:06:28 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 03:06:29 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 03:06:29 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 03:06:29 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 03:06:29 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py'
Traceback (most recent call last):
File "/root/LMFlow/examples/finetune.py", line 61, in <module>
main()
File "/root/LMFlow/examples/finetune.py", line 54, in main
model = AutoModel.get_model(model_args)
File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 196, in __init__
config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 923, in from_pretrained
config_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 177, in get_class_in_module
module = importlib.import_module(module_path)
File "/root/miniconda3/envs/lmflow/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.configuration_baichuan'
[2023-10-08 03:06:31,449] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 33817
[2023-10-08 03:06:32,464] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 33818
[2023-10-08 03:06:33,044] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 33819
[2023-10-08 03:06:33,044] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 33820
[2023-10-08 03:06:33,217] [ERROR] [launch.py:324:sigkill_handler] ['/root/miniconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=3', '--model_name_or_path', '/data/dev/zhang/models/Transformers/baichuan-7B-2/', '--dataset_path', '/data/dev/liu/Data/chat/', '--output_dir', 'output_models/baichuan-7B-Lora-chat', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '1e-4', '--block_size', '512', '--per_device_train_batch_size', '1', '--use_lora', '1', '--lora_r', '8', '--save_aggregated_lora', '0', '--deepspeed', 'configs/ds_config_zero2.json', '--fp16', '--run_name', 'finetune_with_lora', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1
but as you can see:
(lmflow) root@lmflow1:~/LMFlow# ll /root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py
-rw-r--r-- 1 root root 9574 Oct 8 03:06 /root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py
i have the file.
Thanks for your interest in LMFlow! It could be caused by huggingface version problems. Huggingface has gone through a major upgrade related to the model file formats, and the new formats are not supported by old versions, i.e. no forward compatibility. If that's the case, you can use the main
branch of LMFlow to see if that problem still occurs.
If that doesn't solve your issue, please feel free to contact us again. Thanks ๐
Thanks for your interest in LMFlow! It could be caused by huggingface version problems. Huggingface has gone through a major upgrade related to the model file formats, and the new formats are not supported by old versions, i.e. no forward compatibility. If that's the case, you can use the
main
branch of LMFlow to see if that problem still occurs.If that doesn't solve your issue, please feel free to contact us again. Thanks ๐
thanks for reply, I am already in the main
branch of LMFlow which commit id is c530a6f28de94f3b83a2a4b4ff4dbc96529c0503
, so the issue maybe not solved.
Would you mind checking the transformers version pip show transformers
, also trying to read from the local model to see if the problem still occurs? Thanks!
./scripts/run_finetune.sh \
--model_name_or_path /local-path-to-model/ \
--dataset_path data/alpaca/train \
--output_model_path output_models/finetuned_baichuan-7b
Would you mind checking the transformers version
pip show transformers
, also trying to read from the local model to see if the problem still occurs? Thanks!./scripts/run_finetune.sh \ --model_name_or_path /local-path-to-model/ \ --dataset_path data/alpaca/train \ --output_model_path output_models/finetuned_baichuan-7b
As your patiently recommend,
i ran pip show transformers
, it shows:
Version: 4.28.0.dev0
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /root/miniconda3/envs/lmflow/lib/python3.9/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, tokenizers, tqdm
Required-by: lm-eval, lmflow, peft, trl
i ran command line :
./scripts/run_finetune_with_lora.sh --model_name_or_path /data/dev/zhang/models/Transformers/baichuan-7B-2/ --dataset_path /data/dev/liu/Data/train/ --output_lora_path output_models/finetuned_baichuan
it still shows:
[2023-10-08 05:33:31,633] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-10-08 05:33:31,661] [INFO] [runner.py:550:main] cmd = /root/miniconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path /data/dev/zhang/models/Transformers/baichuan-7B-2/ --dataset_path /data/dev/liu/Data/train/ --output_dir output_models/finetuned_baichuan --overwrite_output_dir --num_train_epochs 1 --learning_rate 1e-4 --block_size 512 --per_device_train_batch_size 1 --use_lora 1 --lora_r 8 --save_aggregated_lora 0 --deepspeed configs/ds_config_zero2.json --fp16 --run_name finetune_with_lora --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.13.4-1+cuda11.7
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.13.4-1
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NCCL_VERSION=2.13.4-1
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.13.4-1+cuda11.7
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-10-08 05:33:33,123] [INFO] [launch.py:135:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.13.4-1
[2023-10-08 05:33:33,123] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2023-10-08 05:33:33,123] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=4, node_rank=0
[2023-10-08 05:33:33,123] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2023-10-08 05:33:33,123] [INFO] [launch.py:162:main] dist_world_size=4
[2023-10-08 05:33:33,123] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
[2023-10-08 05:33:37,169] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
10/08/2023 05:35:45 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 05:35:45 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 05:35:45 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 05:35:45 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1,distributed training: True, 16-bits training: True
10/08/2023 05:35:46 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 05:35:46 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 05:35:46 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
10/08/2023 05:35:46 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-418da00e5ce90d62/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py'
Traceback (most recent call last):
File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/huggingface/modules/transformers_modules/tokenization_baichuan.py'
Traceback (most recent call last):
File "/root/LMFlow/examples/finetune.py", line 61, in <module>
main()
File "/root/LMFlow/examples/finetune.py", line 54, in main
model = AutoModel.get_model(model_args)
File "/root/LMFlow/src/lmflow/models/auto_model.py", line 16, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/root/LMFlow/src/lmflow/models/hf_decoder_model.py", line 196, in __init__
config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 923, in from_pretrained
config_class = get_class_from_dynamic_module(
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/root/miniconda3/envs/lmflow/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 177, in get_class_in_module
module = importlib.import_module(module_path)
File "/root/miniconda3/envs/lmflow/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.configuration_baichuan'
and my baichuan model directory is like this:
total 18280792
drwxr-xr-x 2 root root 513 Oct 8 02:39 ./
drwxr-xr-x 25 2002 2000 755 Sep 12 01:11 ../
-rw-r--r-- 1 root root 13122 Jun 19 03:53 README.md
-rw-r--r-- 1 root root 774879 Jun 19 03:53 'baichuan-7B '$'\346\250\241\345\236\213\350\256\270\345\217\257\345\215\217\350\256\256''.pdf'
-rw-r--r-- 1 root root 752 Jul 13 02:34 config.json
-rw-r--r-- 1 root root 2345 Jun 19 03:53 configuration_baichuan.py
-rw-r--r-- 1 root root 132 Jun 19 03:53 generation_config.json
-rw-r--r-- 1 root root 1477 Jun 19 03:53 gitattributes.txt
-rw-r--r-- 1 root root 1052 Jun 19 03:53 handler.py
-rw-r--r-- 1 root root 33128 Jul 13 02:31 modeling_baichuan.py
-rw-r--r-- 1 root root 14001182896 Jun 19 03:53 pytorch_model.bin
-rw-r--r-- 1 root root 411 Jun 19 03:53 special_tokens_map.json
-rw-r--r-- 1 root root 9574 Jun 19 03:53 tokenization_baichuan.py
-rw-r--r-- 1 root root 1136699 Jun 19 03:53 tokenizer.model
-rw-r--r-- 1 root root 802 Jun 19 03:53 tokenizer_config.json
the isssue seems like still exists. Thanks for your patient!๐
i found the requirements that transformers>=4.31.0, so my previous docker image maybe too old, so maybe i should update transformers, i'll let you know if that problem still occurs, thanks! ๐
You're welcome ๐ Please feel free to contact us if you encounter any further issues
after use the latest code which commit id is c530a6f28de94f3b83a2a4b4ff4dbc96529c0503
, and i reinstalled my env by pip install -r requirements.txt
, now i am able to fine tune baichuan7b-2๐, although fine tune baichuan7b-2 use lora is not supported.
anyway, thanks a lot! ๐
UPDATE, if you want to fine tune baichuan-2 use LoRA, just add --lora_target_modules W_pack
in the scripts/run_finetune_with_lora.sh
would be OK! ๐ค