run ds_pretrain_nvidia.sh
lulia0228 opened this issue · 2 comments
File "pretrain_glm.py", line 500, in initialize_distributed
File "pretrain_glm.py", line 470, in set_deepspeed_activation_checkpointing
File "/usr/local/conda/envs/llm_fine_tune/lib/python3.8/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 804, in _configure_using_config_file
if dist.get_rank() == 0:
File "/usr/local/conda/envs/llm_fine_tune/lib/python3.8/site-packages/deepspeed/comm/comm.py", line 575, in get_rank
assert cdb is not None and cdb.is_initialized(), 'DeepSpeed backend not set, please initialize it using init_process_group()'
This is because the code doesn't support the latest version of DeepSpeed. You can install DeepSpeed <= 0.6.2. Or replace the torch.distributed.init_process_group
in initialize_distributed
with deepspeed.init_distributed
.
This is because the code doesn't support the latest version of DeepSpeed. You can install DeepSpeed <= 0.5.9. Or replace the
torch.distributed.init_process_group
ininitialize_distributed
withdeepspeed.init_distributed
.
Thanks for your reply!