git clone https://github.com/ChenYuHo/pytorch-model-profiling.git
cd pytorch-model-profiling
git submodule update --init # --depth 1
The script should be run from the project root directory as follows.
./bin/create-conda-env.sh
conda activate ./env
If you have multiple GPUs, select one to use for profiling, otherwise errors may occur
export NVIDIA_VISIBLE_DEVICES=0
cd src
# assuming conda environment already activated
python pytorch_module_hooks_profiler.py MODEL
check
python pytorch_module_hooks_profiler.py -h
for more options.
cd src/DeepLearningExamples/PyTorch/LanguageModeling/BERT
# Follow README.md to prepare checkpoint and training data in $base_dir such that
# $base_dir/pytorch/bert_uncased.pt
# $base_dir/squad/v1.1/train-v1.1.json
# $base_dir/google_pretrained_weights/uncased_L-24_H-1024_A-16/vocab.txt
# are available. Update run_profiling.sh accordingly.
./scscripts/run_profiling.sh
Reference results can be found at:
get_model_size_and_fp_bp_median.py
extracts model size, median of forward pass, backward pass, and weight update times of every layer. The optional BUCKET_SIZE_MB groups small layers into one layer, so every "bucket" is less than BUCKET_SIZE_MB (best-effort)
python get_model_size_and_fp_bp_median.py [TRACE_JSON_PATH] [BUCKET_SIZE_MB]