OptimalScale/LMFlow

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

alexhmyang opened this issue · 11 comments

RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f683231b670>
Traceback (most recent call last):
File "/home/u20/miniconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-04-03 12:50:15,113] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 21626
[2023-04-03 12:50:15,113] [ERROR] [launch.py:324:sigkill_handler] ['/home/u20/miniconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/u20/LMFlow/data/alpaca/train', '--output_dir', '/home/u20/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

error when run ./scripts/run_finetune.sh
i have gpu and cuda installed,
why it raises cpu error?

./scripts/run_finetune_with_lora.sh also raise same error

could you please provide more log? I think there should be another error before this.

Hi I also get the same error. The log is as follows:

(lmflow) xuyan@black-rack-0:~/LLM/LMFlow$ CUDA_VISIBLE_DEVICES=0 ./scripts/run_finetune.sh "--num_gpus=1 --master_port 10001"
[2023-04-03 14:59:52,961] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected CUDA_VISIBLE_DEVICES=0 but ignoring it because one or several of --include/--exclude/--num_gpus/--num_nodes cl args were used. If you want to use CUDA_VISIBLE_DEVICES don't pass any of these arguments to deepspeed.
[2023-04-03 14:59:55,358] [INFO] [runner.py:550:main] cmd = /home/xuyan/anaconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=10001 --enable_each_rank_log=None examples/finetune.py --model_name_or_path gpt2 --dataset_path /home/xuyan/LLM/LMFlow/data/alpaca/train --output_dir /home/xuyan/LLM/LMFlow/output_models/finetune --overwrite_output_dir --num_train_epochs 0.01 --learning_rate 2e-5 --block_size 512 --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --bf16 --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-04-03 14:59:57,679] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]}
[2023-04-03 14:59:57,680] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-04-03 14:59:57,680] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-04-03 14:59:57,680] [INFO] [launch.py:162:main] dist_world_size=1
[2023-04-03 14:59:57,680] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-04-03 15:00:05,633] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
04/03/2023 15:00:06 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 15:00:07 - WARNING - datasets.builder - Found cached dataset json (/home/xuyan/.cache/huggingface/datasets/json/default-dda63bbab21e635e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
[2023-04-03 15:00:14,782] [INFO] [partition_parameters.py:415:exit] finished initializing model with 0.16B parameters
/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed.all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
04/03/2023 15:00:15 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize..tokenize_function at 0x7f217c927f70> of the transform datasets.arrow_dataset.Dataset.map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 15:00:15 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/xuyan/.cache/huggingface/datasets/json/default-dda63bbab21e635e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 15:00:15 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/xuyan/.cache/huggingface/datasets/json/default-dda63bbab21e635e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bbe2d282518ba636.arrow
Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/xuyan/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/xuyan/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.0/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.0/include -isystem /home/xuyan/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
FAILED: custom_cuda_kernel.cuda.o
/usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.0/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.0/include -isystem /home/xuyan/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/xuyan/LLM/LMFlow/examples/finetune.py", line 70, in
main()
File "/home/xuyan/LLM/LMFlow/examples/finetune.py", line 66, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/home/xuyan/LLM/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1283, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1354, in _configure_basic_optimizer
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 96, in init
self.ds_opt_adam = CPUAdamBuilder().load()
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f2093eb6b80>
Traceback (most recent call last):
File "/home/xuyan/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-04-03 15:00:22,718] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 5948
[2023-04-03 15:00:22,719] [ERROR] [launch.py:324:sigkill_handler] ['/home/xuyan/anaconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/xuyan/LLM/LMFlow/data/alpaca/train', '--output_dir', '/home/xuyan/LLM/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

(lmflow) u20@u20:~/LMFlow/service$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

cuda 11.6 not work?

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

(lmflow) u20@u20:~/LMFlow/service$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

cuda 11.6 not work?

I found it's always hard to debug CUDA version related issues...

It works fine on my machine using conda to install 11.7 version CUDA.

Yes you are right.
I also found that it is a CUDA-related issue. It seems that CUDA11.0 is too old to run deepspeed. But cuda 11.6 should be fine I think.

Thank you very much for your help!

...
nvcc fatal : Unsupported gpu architecture 'compute_86'
...

According to the log, it is indeed due to the CUDA version problem. It seems nvcc is not compatible with your GPU. You may try other version of CUDA. Thanks 😄

yes, I have the same error. And I installed cuda -c nvidia/label/cuda-11.7.0.
It seems ok now.

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

I am using module load gcc/9.2.0 cuda/11.7

But still getting the error
ImportError: /home/xxxxxx/.cache/torch_extensions/py39_cu117/cpu_adam/cpu_adam.so: cannot open shared object file: No such file or directory

It's better to using the same CUDA version with pytorch, like this:

conda install cuda -c nvidia/label/cuda-11.7.0

This solution works for me! Thank you very much for the help! <3

This issue has been marked as stale because it has not had recent activity. If you think this still needs to be addressed please feel free to reopen this issue. Thanks!