DropIT aims to reduce memory consumption in intermediate tensors caching during training DNNs. In computer vision tasks, the GPU memory of intermediate tensors is often hundreds of times the model size (e.g., 20 GB vs. 100 MB for ResNet-50). DropIT solves this problem by adaptively caching part of intermediate tensors in the forward pass, and recovering sparsified tensors for gradient computation in the backward pass.
Interested? Take a look at our paper:
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Joya Chen*, Kai Xu*, Yifei Cheng, Angela Yao (* Equal Contribution)
This repository contains the implementation of DropIT, which is co-developed by Joya Chen and Kai Xu.
The installation of DropIT is simple. The implementation only relies on PyTorch and PyTorch-Lightning.
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch # other cuda version is also okay
pip install pytorch-lightning lightning-bolts
git clone https://github.com/ChenJoya/dropit
pip install -e .
We provide configs in dropit/configs. For example, training vision transformer ViT-B/16 in ImageNet:
CUDA_VISIBLE_DEVICES=0,1 python train.py --cfg configs/imagenet/vit_b_fastminkx0.9.yaml NUM_GPUS 2
Evaluation will be performed every N (N can be set in config) epochs during training. You can use tensorboard to see the results. An example:
Model | Top-1 Acc | Top-5 Acc | Cache (MB) |
---|---|---|---|
ResNet-18 (32x32) | 77.96 | 94.05 | 648 |
ResNet-18 (32x32) w. DropIT | 78.17 | 94.19 | 598 |
ViT-B/16 (224x224) | 90.32 | 98.88 | 20290 |
ViT-B/16 (224x224) w. DropIT | 90.90 | 99.02 | 16052 |
Model | Top-1 Acc | Top-5 Acc | Cache (MB) |
---|---|---|---|
ResNet-18 (224x224) | 69.76 | 89.08 | 2826 |
ResNet-18 (224x224) w. DropIT | 69.85 | 89.39 | 2600 |
ViT-B/16 (224x224) | 83.40 | 96.96 | 20290 |
ViT-B/16 (224x224) w. DropIT | 83.61 | 97.01 | 16056 |
Please consider citing our paper if it helps your research. The following is a BibTeX reference.
@article{dropit,
title={DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training},
author={Joya Chen and Kai Xu and Yifei Cheng and Angela Yao},
year={2022},
journal={arXiv:2202.13808},
}