Train MAE With Colossal-AI

Paper | Documentation | Examples | Forum | Blog

An integrated large-scale model training system with efficient parallelization techniques.

SetUp

Install Colossal AI from source (Recommended)

git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
# install dependency
pip install -r requirements/requirements.txt

# install colossalai
pip install .

Install and enable CUDA kernel fusion (compulsory installation when using fused optimizer)

pip install -v --no-cache-dir --global-option="--cuda_ext" .

You may need more details about installation on Colossal-AI

Pre-training

To pre-train MAE with Colossal AI, firstly write an config file including running parameters.

from colossalai.amp import AMP_TYPE

TOTAL_BATCH_SIZE = 4096
LR = 1.5e-4
WEIGHT_DECAY = 0.05

TENSOR_PARALLEL_SIZE = 1
TENSOR_PARALLEL_MODE = None

NUM_EPOCHS = 800
WARMUP_EPOCHS = 40

parallel = dict(
    pipeline=1,
    tensor=dict(mode=TENSOR_PARALLEL_MODE, size=TENSOR_PARALLEL_SIZE),
)

fp16 = dict(mode=AMP_TYPE.TORCH, )

gradient_accumulation = 2

BATCH_SIZE = TOTAL_BATCH_SIZE // gradient_accumulation

clip_grad_norm = 1.0

LOG_PATH = f"./vit_{TENSOR_PARALLEL_MODE}_imagenet1k_tp{TENSOR_PARALLEL_SIZE}_bs{BATCH_SIZE}_lr{LR}_{fp16['mode']}_clip_grad{clip_grad_norm}/"

MODEL="mae_vit_large_patch16"
NORM_PIX_LOSS=True
MASK_RATIO=0.75

Here the effective batch size is 64 (batch_size per gpu) * 8 (nodes) * 8 (gpus per node) = 4096. If memory or # gpus is limited, use --accum_iter to maintain the effective batch size, which is batch_size (per gpu) * nodes * 8 (gpus per node) * accum_iter.
blr is the base learning rate. The actual lr is computed by the linear scaling rule: lr = blr * effective batch size / 256.
Here we use --norm_pix_loss as the target for better representation learning. To train a baseline model (e.g., for visualization), use pixel-based construction and turn off --norm_pix_loss.

Launch

In this experiment, we only run Colossal-AI with PyTorch

Firstly, to load config from config.py:

colossalai.launch_from_torch(config='./colossal-ai/config.py')

Then, in python file, you can access previous defined parameters by: gpc.confg.${param}.

Here is an example to tell train code to choose which model and whether use norm_pix_loss:

model = models_mae.__dict__[gpc.config.MODEL](norm_pix_loss=gpc.config.NORM_PIX_LOSS)

After finishing your training code, then enjoy your training by using:

$ torchrun --standalone --nproc_per_node=8 colossal_train.py

Cite us

@article{bian2021colossal,
  title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
  author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
  journal={arXiv preprint arXiv:2110.14883},
  year={2021}
}