/simulst

PyTorch toolkit for streaming speech recognition, speech translation and simultaneous translation based on fairseq.

Primary LanguagePython

Simultaneous Speech Translation

Code base for simultaneous speech translation experiments. It is based on fairseq.

Implemented

Encoder

Streaming Models

Setup

  1. Install fairseq
git clone https://github.com/pytorch/fairseq.git
cd fairseq
git checkout 4a7835b
python setup.py build_ext --inplace
pip install .
  1. (Optional) Install apex for faster mixed precision (fp16) training.
  2. Install dependencies
pip install -r requirements.txt
  1. Update submodules
git submodule update --init --recursive

Pre-trained model

ASR model with Emformer encoder and Transformer decoder. Pre-trained with joint CTC cross-entropy loss.

MuST-C (WER) en-de (V2) en-es
dev 9.65 14.44
tst-COMMON 12.85 14.02
model download download
vocab download download

Sequence-level Knowledge Distillation

MuST-C (BLEU) en-de (V2)
valid 31.76
distillation download
vocab download

Citation

Please consider citing our paper:

@inproceedings{chang22f_interspeech,
  author={Chih-Chiang Chang and Hung-yi Lee},
  title={{Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={5175--5179},
  doi={10.21437/Interspeech.2022-10627}
}