/E2E-ASR

PyTorch Implementations for End-to-End Automatic Speech Recognition

Primary LanguagePython

Graves 2013 experiments

File description

  • model.py: rnnt joint model
  • model2012.py: graves2012 model
  • train_rnnt.py: rnnt training script
  • train_ctc.py: ctc acoustic model training script
  • eval.py: rnnt & ctc decode
  • DataLoader.py: kaldi feature loader

Run

  • Extract feature link kaldi timit example dirs (local steps utils ) excute run.sh to extract 40 dim fbank feature run feature_transform.sh to get 123 dim feature as described in Graves2013

  • Train CTC acoustic model

python train_ctc.py --lr 1e-3 --bi --dropout 0.5 --out exp/ctc_bi_lr1e-3 --schedule
  • Train RNNT joint model
python train_rnnt.py --lr 4e-4 --bi --dropout 0.5 --out exp/rnnt_bi_lr4e-4 --schedule
  • Decode
python eval.py <path to best model> [--ctc] --bi

Results

Model PER
CTC 21.38
RNN-T 20.59

Requirements

Reference