End-to-end Automatic Speech Recognition Systems - PyTorch Implementation

For complete introdution and usage, please see the original repository Alexander-H-Liu/End-to-end-ASR-Pytorch.

New features

Added layer-wise transfer learning
Supports multiple development sets
Supports FreqCNN (frequency-divided CNN extractor) for whispered speech recognition.
Supports DLHLP corpus for the course Deep Learning for Human Language Processing

Instructions

Training

Modify script/train.sh, script/train_lm.sh, config/librispeech_asr.yaml, and config/librispeech_lm.yaml first. GPU is required.

bash script/train.sh <asr name> <cuda id>
bash script/train_lm.sh <lm name> <cuda id>

Testing

Modify script/test.sh and config/librispeech_test.sh first. Increase the number of --njobs can speed up decoding process, but might cause OOM.

bash script/test.sh <asr name> <cuda id>

LibriSpeech 100hr Baseline

This baseline is composed of a character-based joint CTC-attention ASR model and an RNNLM which were trained on the LibriSpeech train-clean-100. The perplexity of the LM on the dev-clean set is 3.66.

Decoding	DEV WER(%)	TEST WER(%)
Greedy	25.4	25.9

DLHLP Baseline

This baseline is composed of a character-based joint CTC-attention ASR model and an RNN-LM which were trained on the DLHLP training set.

Decoding	DEV CER/WER(%)	TEST CER/WER(%)
SpecAugment + Greedy	1.0 / 3.4	0.8 / 3.1
SpecAugment + Beam=5	0.8 / 2.9	0.7 / 2.6

TODO

CTC beam decoding (testing)
SpecAugment (will be released)
Multiple corpora training (will be released)
Support of WSJ and Switchboard dataset (under construction)
Combination of CTC and RNN-LM: RNN transducer (under construction)

Citation

@inproceedings{liu2019adversarial,
  title={Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model},
  author={Liu, Alexander and Lee, Hung-yi and Lee, Lin-shan},
  booktitle={International Conference on Speech RecognitionAcoustics, Speech and Signal Processing (ICASSP)},
  year={2019},
  organization={IEEE}
}

@inproceedings{alex2019sequencetosequence,
    title={Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding},
    author={Alexander H. Liu and Tzu-Wei Sung and Shun-Po Chuang and Hung-yi Lee and Lin-shan Lee},
    booktitle={International Conference on Speech RecognitionAcoustics, Speech and Signal Processing (ICASSP)},
    year={2020},
    organization={IEEE}
}

@inproceedings{chang2020endtoend,
    title={End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training},
    author={Heng-Jui Chang and Alexander H. Liu and Hung-yi Lee and Lin-shan Lee},
    booktitle={Spoken Language Technology Workshop (SLT)},
    year={2021},
    organization={IEEE}
}

faliwang/End-to-end-ASR-Pytorch-DLHLP