MPI environment variables are not set
fabiogeraci opened this issue · 2 comments
fabiogeraci commented
System Info
HPC ubuntu 22.04 2nodesx8H100LSF as scheduler
[tool.poetry.dependencies]
python = "^3.10"importlib-metadata = { version = "~=1.0", python = "<3.8" }
tensorboard = "^2.16.2"
sge-data-package = {version = "", source = "sgedata"}
torch = "2.2.1"
torchvision = "0.17.1"
torchaudio = "2.2.1"
transformers = "4.42.0"
datasets = "2.18."
accelerate = "0.28.0"
deepspeed = "0.13.4"
safetensors = "0.4.2"
mpi4py = "^4.0.0"
module load cuda-12.1.1
module load ISG/experimental/fg12/openmpi/5.0.4-cuda12.1-lsf
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
deepspeed \
--hostfile=${HOSTFILE_PATH} \
--launcher=OPENMPI \
--launcher_args="-bind-to none -map-by slot --mca pml ob1 --oversubscribe --display-allocation --display-map" \
--master_addr=${MASTER_ADDR} \
--master_port=${_M_PORT} \
--no_ssh_check \
src/dna_mlm/runner.py
def setup_env_ranks() -> tp.Tuple[int, int, int]:
# Map MPI environment variables to those expected by DeepSpeed/PyTorch
if 'OMPI_COMM_WORLD_LOCAL_RANK' in os.environ:
os.environ['LOCAL_RANK'] = os.environ['OMPI_COMM_WORLD_LOCAL_RANK']
os.environ['RANK'] = os.environ['OMPI_COMM_WORLD_RANK']
os.environ['WORLD_SIZE'] = os.environ['OMPI_COMM_WORLD_SIZE']
else:
raise EnvironmentError(
"MPI environment variables are not set. "
"Ensure you are running the script with an MPI-compatible launcher."
)
setup_env_ranks()
the function should set the env vars but instaed it raises the error
fabiogeraci commented
I found the error
deepspeed \
--hostfile ${HOSTFILE_PATH} \
--launcher "OPENMPI" \ #openmpi should have been between ""
--launcher_args "-bind-to none -map-by slot --allow-run-as-root --mca pml ob1 --oversubscribe --display-allocation --display-map" \
--master_addr ${MASTER_ADDR} \
--master_port ${_M_PORT} \
--no_ssh_check \
src/runner.py
fabiogeraci commented
the real question is why I need to setup
if 'OMPI_COMM_WORLD_LOCAL_RANK' in os.environ:
os.environ['LOCAL_RANK'] = os.environ['OMPI_COMM_WORLD_LOCAL_RANK']
os.environ['RANK'] = os.environ['OMPI_COMM_WORLD_RANK']
os.environ['WORLD_SIZE'] = os.environ['OMPI_COMM_WORLD_SIZE']
else:
raise EnvironmentError(
"MPI environment variables are not set. "
"Ensure you are running the script with an MPI-compatible launcher."
)