Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically. All baselines, such as TopkDSA, gTopk, and Guassiank, are already integrated in the repo.
To install the required Python modules:
conda create --name py38_oktopk python=3.8
conda activate py38_oktopk
pip3 install pip==20.2.4
pip install -r requirements.txt
MPICC="cc -shared" pip install --no-binary=mpi4py mpi4py
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ./VGG/vgg_data
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar -zxvf cifar-10-python.tar.gz
cd ./LSTM/audio_data
wget https://www.dropbox.com/s/l5w4up20u5pfjxf/an4.zip
unzip an4.zip
cd ./BERT/bert/bert_data/
Prepare the dataset according to the README file.
We run experiments on GPU clusters with SLURM job scheduler.
To evaluate the performance of Ok-Topk
, Gaussiank
, gtopk
, topkA
, topkDSA
, and dense
, run the jobs as follows.
cd ./VGG
./sbatch_vgg_jobs.sh
cd ./LSTM
./sbatch_lstm_jobs.sh
cd ./BERT/bert/
./sbatch_bert_jobs.sh
The work of Ok-Topk is pulished in PPoPP'22. DOI
See LICENSE.