/fasterseq

Implementation of faster and more efficient sequence modeling deep learning algorithms based on fairseq, e.g. Lite-Transormer, Reformer, Linformer, efficient variants of attention mechanism and so on

Primary LanguagePythonMIT LicenseMIT

About FasterSeq

As the fast development of the Internet of Things and other mobile devices, the deployment of faster and more efficient deep learning techniques needs more efforts. I create this repository for exploring and developing state-of-the-art accelerating computation and efficient deep learning techniques for faster sequence modeling. The base library we choose is fairseq which is an open-source library for sequence modeling developed and maintained by facebook artificial intelligence research lab.

Requirements and Installation

  • PyTorch version >= 1.4.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • To install fairseq and develop locally:
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
  • For faster training install NVIDIA's apex library:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./
  • For large datasets install PyArrow: pip install pyarrow
  • If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run.
List of implemented techniques (ongoing)

The authors from MIT-Han-Lab have already published their fairseq-based codes, one may check out here. Additionally, I re-implement the Lite-Transformer since the authors seem used an old version of fairseq and it may cause further conflicts especially when you need to apply the incremenral_state function. Before you test the model, please make sure you install the cuda version lightConv and dynamicConv by:

cd fairseq/modules/lightconv_layer
python cuda_function_gen.py
python setup.py install
cd fairseq/modules/dynamicconv_layer
python cuda_function_gen.py
python setup.py install

Here we use IWSLT'14 dataset as an example to demo how to train a new Lite-Transformer.

First download and preprocess the data:

# Download and prepare the data
cd examples/translation/
bash prepare-iwslt14.sh
cd ../..

# Preprocess/binarize the data
TEXT=examples/translation/iwslt14.tokenized.de-en
fairseq-preprocess --source-lang de --target-lang en \
    --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
    --destdir data-bin/iwslt14.tokenized.de-en \
    --workers 20

Next we'll train a new Lite-Transformer translation model over this data: we choose the transformer_multibranch_iwslt_de_en architecture, and save all the training log into 'train.log' file.

CUDA_VISIBLE_DEVICES=0 fairseq-train \
    data-bin/iwslt14.tokenized.de-en \
    --arch transformer_multibranch_iwslt_de_en \
    --share-decoder-input-output-embed \
    --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
    --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
    --dropout 0.2 --weight-decay 0.0001 \
    --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
    --max-tokens 4096 --max-update 50000 \
    --encoder-branch-type attn:1:80:4 lightweight:default:80:4 \
    --decoder-branch-type attn:1:80:4 lightweight:default:80:4 \
    --weight-dropout 0.1 \
    --encoder-embed-dim 160 --decoder-embed-dim 160 \
    --encoder-ffn-embed-dim 160 --decoder-ffn-embed-dim 160 \
    --save-dir checkpoints/transformer_multibranch \
    --tensorboard-logdir checkpoints/transformer_multibranch/log > checkpoints/transformer_multibranch/train.log

Finally we can evaluate our trained model:

fairseq-generate data-bin/iwslt14.tokenized.de-en \
    --path checkpoints/transformer_multibranch/checkpoint_best.pt \
    --batch-size 128 --beam 4 --remove-bpe --lenpen 0.6

Later on, I will focus on all the state-of-the-art faster sequence modeling techniques, including re-implementation, validating and try to give a track on all the effective techniques to make efficient computation and accelerating in deep learning based on fairseq (More is Comming.. )

Getting Started

The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks. Fore complete details, please go check the fairseq website https://github.com/pytorch/fairseq.

Pre-trained models and examples

fairseq provides pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.

We also have more detailed READMEs to reproduce results from specific papers: