Paper | Documentation | Examples | Forum | Blog
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
# install dependency
pip install -r requirements/requirements.txt
# install colossalai
pip install .
Install and enable CUDA kernel fusion (compulsory installation when using fused optimizer)
pip install -v --no-cache-dir --global-option="--cuda_ext" .
You may need more details about installation on Colossal-AI
To pre-train MAE with Colossal AI, firstly write an config file including running parameters.
from colossalai.amp import AMP_TYPE
TOTAL_BATCH_SIZE = 4096
LR = 1.5e-4
WEIGHT_DECAY = 0.05
TENSOR_PARALLEL_SIZE = 1
TENSOR_PARALLEL_MODE = None
NUM_EPOCHS = 800
WARMUP_EPOCHS = 40
parallel = dict(
pipeline=1,
tensor=dict(mode=TENSOR_PARALLEL_MODE, size=TENSOR_PARALLEL_SIZE),
)
fp16 = dict(mode=AMP_TYPE.TORCH, )
gradient_accumulation = 2
BATCH_SIZE = TOTAL_BATCH_SIZE // gradient_accumulation
clip_grad_norm = 1.0
LOG_PATH = f"./vit_{TENSOR_PARALLEL_MODE}_imagenet1k_tp{TENSOR_PARALLEL_SIZE}_bs{BATCH_SIZE}_lr{LR}_{fp16['mode']}_clip_grad{clip_grad_norm}/"
MODEL="mae_vit_large_patch16"
NORM_PIX_LOSS=True
MASK_RATIO=0.75
- Here the effective batch size is 64 (
batch_size
per gpu) * 8 (nodes
) * 8 (gpus per node) = 4096. If memory or # gpus is limited, use--accum_iter
to maintain the effective batch size, which isbatch_size
(per gpu) *nodes
* 8 (gpus per node) *accum_iter
. blr
is the base learning rate. The actuallr
is computed by the linear scaling rule:lr
=blr
* effective batch size / 256.- Here we use
--norm_pix_loss
as the target for better representation learning. To train a baseline model (e.g., for visualization), use pixel-based construction and turn off--norm_pix_loss
.
In this experiment, we only run Colossal-AI with PyTorch
Firstly, to load config from config.py:
colossalai.launch_from_torch(config='./colossal-ai/config.py')
Then, in python file, you can access previous defined parameters by: gpc.confg.${param}.
Here is an example to tell train code to choose which model and whether use norm_pix_loss:
model = models_mae.__dict__[gpc.config.MODEL](norm_pix_loss=gpc.config.NORM_PIX_LOSS)
After finishing your training code, then enjoy your training by using:
$ torchrun --standalone --nproc_per_node=8 colossal_train.py
@article{bian2021colossal,
title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
journal={arXiv preprint arXiv:2110.14883},
year={2021}
}