Code base for simultaneous speech translation experiments. It is based on fairseq.
- Install fairseq
git clone https://github.com/pytorch/fairseq.git
cd fairseq
git checkout 4a7835b
python setup.py build_ext --inplace
pip install .
- (Optional) Install apex for faster mixed precision (fp16) training.
- Install dependencies
pip install -r requirements.txt
- Update submodules
git submodule update --init --recursive
ASR model with Emformer encoder and Transformer decoder. Pre-trained with joint CTC cross-entropy loss.
MuST-C (WER) | en-de (V2) | en-es |
---|---|---|
dev | 9.65 | 14.44 |
tst-COMMON | 12.85 | 14.02 |
model | download | download |
vocab | download | download |
MuST-C (BLEU) | en-de (V2) |
---|---|
valid | 31.76 |
distillation | download |
vocab | download |
Please consider citing our paper:
@inproceedings{chang22f_interspeech,
author={Chih-Chiang Chang and Hung-yi Lee},
title={{Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation}},
year=2022,
booktitle={Proc. Interspeech 2022},
pages={5175--5179},
doi={10.21437/Interspeech.2022-10627}
}