/halp

Primary LanguagePythonApache License 2.0Apache-2.0

High Accuracy Low Precision Training (HALP)

HALP is a PyTorch-based simulator for the HALP (High Accuracy Low Precision) training algorithm. HALP is a low-precision stochastic gradient descent variant that uses entirely low-precision computation in its inner loop while infrequently recentering this computation with higher-precision computation done in an outer loop. HALP anchors on two key components: (1) a known variance reduction method based on stochastic variance-reduced gradient (SVRG); (2) a novel bit centering technique that uses infrequent high-precision computation to reduce quantization noise. HALP_PyTorch is built on the IEEE float16 tensor and arithmetic provided by PyTorch. This implementation can be used to replicate our experiment results on multiple models, including logistic regression, LeNet, LSTM and ResNet.

Content

Setup instructions

  • Create conda python 3.6 environment conda create -n <name of the environment> python=3.6
  • Install PyTorch. Our implementation is tested under PyTorch 0.4.1 using cuda 9.0 and torchvision 0.2.1.
pip install https://download.pytorch.org/whl/cu90/torch-0.4.1-cp36-cp36m-linux_x86_64.whl
pip install torchvision
  • Install nltk 3.3 to support data processing for the LSTM experiment: conda install -c anaconda nltk
  • Clone the HALP repo
git clone https://github.com/HazyResearch/halp.git
  • Setup HALP module for python
pip install -e halp
export PYTHONPATH="$PYTHONPATH:path to current directory/halp"

Command guidelines

  • Key arguments

    • Specify the model and dataset
    The dataset and the model are specified via argument --dataset and --model. 
    Our simulator currently supports:
    
    * Logistic regression with MNIST dataset (--model=logreg --dataset=mnist)
    * LeNet with CIFAR10 dataset (--model=lenet --dataset=cifar10)
    * LSTM with CONLL2000 dataset (--model=lstm --dataset=conll2000)
    * ResNet with CIFAR10 dataset (--model=resnet --dataset=cifar10)
    
    • Specify training algorithm
    The HALP simulator currently support :
    * IEEE float32 SGD:
      --solver=sgd --rounding=void
    * IEEE float16 SGD:
      --solver=lp-sgd --rounding=near
    * IEEE float32 SVRG:
      --solver=svrg --rounding=void -T=<# of steps between each full gradient compute>
    * IEEE float16 SVRG:
      --solver=lp-svrg --rounding=near -T=<# of steps between each full gradient compute>
    * IEEE float16 HALP:
      --solver=bc-svrg --rounding=near -T=<# of steps between each full gradient compute>.
      HALP can optionally use --on-site-compute. This mode avoid caching bit-centering offset 
      activation / gradient tensors for the whole dataset. Instead on-site-compute mode only 
      compute the offset when they are needed. This saves host memory in our simulator for large models.
    
    * Misc specification
    * --cuda: this must be specified as IEEE float16 arithmetic is only supported by gpu in PyTorch
    * --alpha --momentum: learning and momentum value
    * --reg: strength of L2 regularization
    * --n-classes: # of classes for the classification problems
    * --seed: specified the random seed
    * --batch-size: the minibatch size
    * --n-epochs: the total number of epochs for training
    
  • Example runs

    We present the commands for several configurations as examples.

    • Logistic regression MNIST experiment:
    (IEEE float16 HALP) cd ./exp_script && python run_models.py --n-epochs=100 --batch-size=100 --reg=1e-5 --alpha=0.05 --momentum=0.9 --seed=1  --n-classes=10  --solver=bc-svrg  --rounding=near  -T=600  --dataset=mnist  --model=logreg  --cuda
    
    (IEEE float32 SGD) cd ./exp_script && python run_models.py --n-epochs=100 --batch-size=100 --reg=1e-5 --alpha=0.01 --momentum=0.9 --seed=1  --n-classes=10  --solver=sgd  --rounding=void --dataset=mnist  --model=logreg  --cuda
    
    • LeNet CIFAR10 experiment:
    (IEEE float16 HALP) cd ./exp_script && python run_models.py --n-epochs=100 --batch-size=128 --reg=0.0005 --alpha=0.001 --momentum=0.9 --seed=1  --n-classes=10  --solver=bc-svrg  --rounding=near  -T=391  --dataset=cifar10  --model=lenet  --cuda
    
    (IEEE float32 SVRG) cd ./exp_script && python run_models.py --n-epochs=100 --batch-size=128 --reg=0.0005 --alpha=0.001 --momentum=0.9 --seed=1  --n-classes=10  --solver=svrg  --rounding=void  -T=391  --dataset=cifar10  --model=lenet  --cuda
    
    • LSTM CONLL2000 experiment:
    (Pre-process CONLL2000 tagging data) 
    mkdir datasets
    python ./utils/postag_data_utils.py
    
    (IEEE float16 HALP) cd ./exp_script && python run_models.py --n-epochs=100 --batch-size=16 --reg=0.0 --alpha=0.5 --momentum=0.0 --seed=3  --n-classes=12  --solver=bc-svrg  --rounding=near  -T=279  --dataset=conll2000  --model=lstm  --cuda  --on-site-compute
    
    (IEEE float16 SGD) cd ./exp_script && python run_models.py --n-epochs=100 --batch-size=16 --reg=0.0 --alpha=5.0 --momentum=0.0 --seed=1  --n-classes=12  --solver=lp-sgd  --rounding=near --dataset=conll2000  --model=lstm  --cuda  --on-site-compute
    
    • ResNet CIFAR10 fine-tuning experiment:
    (IEEE float16 SGD model checkpoint collection) cd ./exp_script && python run_models.py --n-epochs=350 --batch-size=128 --reg=0.0005 --alpha=0.1 --momentum=0.9 --seed=1  --n-classes=10  --solver=lp-sgd  --rounding=near  -T=391  --dataset=cifar10  --model=resnet  --cuda  --resnet-save-ckpt  --resnet-save-ckpt-path=<folder path to save check point>
    
    (IEEE float16 HALP warm start tuning run) cd ./exp_script && python run_models.py --n-epochs=100 --batch-size=128 --reg=0.0005 --alpha=0.1 --momentum=0.0 --seed=1  --n-classes=10  --solver=bc-svrg  --rounding=near  -T=391  --dataset=cifar10  --model=resnet  --cuda  --on-site-compute --resnet-load-ckpt  --resnet-save-ckpt-path=<path to the saved model check point> --resnet-load-ckpt-epoch-id=300
    

Acknowledgements

We thank Nimit Sohoni, Paroma Varma, Albert Gu, Tri Dao, Charles Kuang for the helpful discussion.