A Dockerfile to build my portable environment for AI/ML development, with some daily used packages!
The docker image contains:
- GPU support
- jupyter lab/notebook interactive programming based on web UI
- tensorboard visualization
- full-stack ssh capacity
- latest pytorch environment (1.13) with miniconda support
- OpenMMLab MMCV(1.7)/MMDetection/MMClassification support
- MLNX OFED support (version 5.4 for distributed training with RDMA)
- Deepspeed with Huggingface Transformers/Datasets/Accelerate
git clone https://github.com/youkaichao/my-portable-aiml-environment
cd my-portable-aiml-environment
wget https://mirrors.bfsu.edu.cn/anaconda/miniconda/Miniconda3-py38_23.1.0-1-Linux-x86_64.sh -O Miniconda3.sh
cp ~/.ssh/id_rsa.pub id_rsa.pub
wget https://github.com/ohmyzsh/ohmyzsh/archive/refs/heads/master.zip -O ohmyzsh-master.zip
wget https://download.openmmlab.com/mmcv/dist/cu117/torch1.13.0/mmcv_full-1.7.0-cp38-cp38-manylinux1_x86_64.whl
# download MLNX OFED installer from https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/
# download cuda toolkit dev from https://developer.nvidia.com/cuda-11-7-0-download-archive (installer type: runfile, local)
cd ..
scp -r my-portable-aiml-environment your@machine:/path
Note: You can also download files from my Tsinghua Cloud Storage at https://cloud.tsinghua.edu.cn/d/d6527e7e16714f189b67/ .
Command from https://mirrors.tuna.tsinghua.edu.cn/help/docker-ce/ :
export DOWNLOAD_URL="https://mirrors.tuna.tsinghua.edu.cn/docker-ce"
wget -O- https://get.docker.com/ | sh
Make sure you are root user.
Check service docker status
to see if docker is running. If not, run service docker start
.
In a separate terminal, serve installer files by python3 -m http.server 8080
.
export user=youkaichao
export passwd=whatispass
export HOST_IP=$(hostname -I | awk '{print $1}')
export HOST_PORT=8080
docker build --build-arg user=$user --build-arg passwd=$passwd --build-arg HOST_ENDPOINT=$HOST_IP:$HOST_PORT --progress=plain --tag youkaichao/pytorch113_cu117_ubuntu2004:openmmlab-ofed-deepspeed .
docker login
docker push youkaichao/pytorch113_cu117_ubuntu2004:openmmlab-ofed-deepspeed
Install the container runtime for GPU, follow the guide.
Run the image in a container:
docker compose up
In a separated shell, try:
ssh $user@127.0.0.1 -P 3232
And you should log in with a ZSH shell! Try nvidia-smi
to see the GPU, and conda activate env && python -c "from mmdet.apis import init_random_seed, set_random_seed, train_detector; import torch; a=torch.randn(500, 500).cuda(); b=a.max(); print(b)"
to see that GPU is enabled!
Sometimes the password is not set correctly. You can also use docker exec -it container_id /bin/bash
to attach to that image.
Note: To test RDMA support, use show_gids
to see RDMA devices. If the output is not empty, then RDMA support is on!
If you want to test the bandwidth of RDMA:
- Find a machine to be master, and use
hostname -I | awk '{print $1}'
to find its IP. - Execute the following command in each machine, with the node-rank changed to an index starting from 0. (The master machine should have node-rank of 0.)
export NCCL_IB_GID_INDEX=`show_gids | grep -i v2 | tail -1 | awk '{print $3}'`
export N_NODES=2
export nproc_per_node=8
export MASTER_ADDR=xxx
export NODE_RANK=0
export NCCL_DEBUG=INFO
export UCX_LOG_LEVEL=debug
export MASTER_PORT=12345
echo node rank $NODE_RANK, master at $MASTER_ADDR:$MASTER_PORT
/home/youkaichao/miniconda/envs/env/bin/torchrun --nnodes=$N_NODES --nproc_per_node=$nproc_per_node --node_rank=$NODE_RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT test_rdma.py
You should see detailed log from NCCL, and an estimated bandwith in the unit of Gbps.
An example of output is shown below:
NCCL INFO Using network IB
time spent for each trial: 1.0380311965942384
param all-reduce speed: 3.853448733644931 GB/s
communication bandwidth (estimated): 61.65517973831889 Gb/s
Note the log "Using network IB" means that InfiniBand is used (via RoCE or InfiniBand card) for inter-machine communication.
If your InfiniBand support is not configured, you may see something like the following:
NCCL INFO Using network Socket
time spent for each trial: 1.650430393218994
param all-reduce speed: 2.4236102391439927 GB/s
communication bandwidth (estimated): 38.77776382630388 Gb/s
Note the log "Using network Socket" means that TCP/socket is used for inter-machine communication, which should be much slower compared with InfiniBand support.
If you switch the backend to gloo
, you will see an even slower speed:
time spent for each trial: 5.7438788414001465
param all-reduce speed: 0.6963935191615126 GB/s
To use DeepSpeed, you may set the following environment variables:
export CUDA_HOME=/home/youkaichao/cuda/
export PATH=/home/youkaichao/cuda/bin:$PATH
deepspeed --help
If you are working inside a company, and the computer has no access to the internet, you can replace files in sources.list
, pip.conf
, .condarc
with the apt-source/pip index/conda channel in your company.
As for the Miniconda/Zsh files, just download them with your laptop and upload the machine to build the docker image (or just build the docker image with your laptop).
If you don't want to build the image yourself, you can also use the image I have built: youkaichao/pytorch113_cu117_ubuntu2004:openmmlab-ofed-deepspeed
. Since it contains my public key, if you don't want me to be able to log in to your server, remember to delete the public key in it: /home/youkaichao/.ssh/authorized_keys.
For a typical image to train models, oh-my-zsh and tensorboard/jupyterlab are not necessary. Feel free to change the Dockerfile to fit your need!
Happy Coding, Happy Life! No more environment setup for every machine!