/FBK-fairseq-ST

A repository containing the code for speech translation papers.

Primary LanguagePythonMIT LicenseMIT

FBK-fairseq-ST

This repository is a fork of https://github.com/pytorch/fairseq containing additional code used for our papers. Most of our code is in the examples/speech_recognition folder.

If you use this code, please consider citing the related paper. The repository contains the code for:

  • M. Gaido et al., "CTC-based Compression for Direct Speech Translation", EACL 2021
  • M. Gaido, B. Savoldi et al., "Breeding Gender-aware Direct Speech Translation Systems", COLING 2020
  • M. Gaido et al., "Contextualized Translation of Automatically Segmented Speech", INTERSPEECH 2020
  • M. Gaido et al., "On Knowledge Distillation for Direct Speech Translation", CliC-IT 2020
@inproceedings{gaido-etal-2020-breeding,
    title = "Breeding Gender-aware Direct Speech Translation Systems",
    author = "Gaido, Marco  and
      Savoldi, Beatrice  and
      Bentivogli, Luisa  and
      Negri, Matteo  and
      Turchi, Marco",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.350",
    pages = "3951--3964",
}
@inproceedings{Gaido2020,
  author={Gaido, Marco and Di Gangi, Mattia A. and Negri, Matteo and Cettolo, Mauro and Turchi, Marco},
  title={{Contextualized Translation of Automatically Segmented Speech}},
  year=2020,
  month=oct,
  booktitle={Proc. of Interspeech 2020},
  pages={1471--1475},
  doi={10.21437/Interspeech.2020-2860},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2860}
}

CTC Compression

The script used to run the experiments for the EACL 2021 paper is the following:

save_dir=$1
lang=$2
source_t=$3
compress_layer=$4

python train.py /datadrive/data/corpora/ctccompress/en-$lang/ \
    -s $source_t -t $lang --skip-normalization --user-dir examples/speech_recognition \
    --clip-norm 20 \
    --ddp-backend=no_c10d \
    --max-sentences 8 \
    --max-tokens 12000 \
    --max-source-positions 2000 --max-target-positions 1000 \
    --save-dir $save_dir \
    --max-epoch 150 \
    --min-lr 1e-09 \
    --dropout 0.2 \
    --lr 5e-3 --min-lr 1e-07 --reset-optimizer \
    --lr-scheduler inverse_sqrt \
    --warmup-updates 4000 --warmup-init-lr 3e-4 \
    --update-freq 8 \
    --optimizer adam --adam-betas '(0.9, 0.98)' \
    --distance-penalty log \
    --no-attn-2d --encoder-layers 11 --decoder-layers 4 \
    --ctc-compress-out --ctc-encoder-layer $compress_layer \
    --arch conv_transformer_big2 --task speech_translation_with_transcription \
    --input-feat-per-channel 40 \
    --skip-invalid-size-inputs-valid-test \
    --sentence-avg \
    --specaugment --frequency-masking-pars 13 --time-masking-pars 20 --specaugment-rate 0.5 --frequency-masking-num 2 --time-masking-num 2 \
    --criterion ctc_multi_loss --underlying-criterion label_smoothed_cross_entropy --label-smoothing 0.1

Below, there is the original README file.



MIT License Latest Release Build Status Documentation Status


Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

What's New:

Features:

Fairseq provides reference implementations of various sequence-to-sequence models, including:

Additionally:

  • multi-GPU (distributed) training on one machine or across multiple machines
  • fast generation on both CPU and GPU with multiple search algorithms implemented:
  • large mini-batch training even on a single GPU via delayed updates
  • mixed precision training (trains faster with less GPU memory on NVIDIA tensor cores)
  • extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers

We also provide pre-trained models for translation and language modeling with a convenient torch.hub interface:

en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'

See the PyTorch Hub tutorials for translation and RoBERTa for more examples.

Model

Requirements and Installation

  • PyTorch version >= 1.4.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • For faster training install NVIDIA's apex library:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--deprecated_fused_adam" --global-option="--xentropy" --global-option="--fast_multihead_attn" ./

To install fairseq:

pip install fairseq

On MacOS:

CFLAGS="-stdlib=libc++" pip install fairseq

If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run.

Installing from source

To install fairseq from source and develop locally:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .

Getting Started

The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.

Pre-trained models and examples

We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.

  • Translation: convolutional and transformer models are available
  • Language Modeling: convolutional and transformer models are available
  • wav2vec: wav2vec large model is available

We also have more detailed READMEs to reproduce results from specific papers:

Join the fairseq community

License

fairseq(-py) is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Please cite as:

@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}