End-to-End Automatic Speech Recognition

This repository contrains implementations of end-to-end ASR system by LAS, CTC(w/o attention), and transducer(w/o attention).

Dependencies

Model	train/dev loss	train/dev per	Epoch
CTC	0.64/1.03	0.20/0.315	178
Transducer	12.0/-	-/0.2662	13
Pretrained Transducer	0.7/-	-/0.2670	195
LAS

language model train/dev loss: 2.68/2.80 train/dev ppl: 14.5/16.49 epoch: 292

A Comparison of Sequence-to-Sequence Models for Speech Recognition [Ref]
Deep Learning for Human Language Processing (2020,Spring) [Ref]
Alexander-H-Liu/End-to-end-ASR-Pytorch [Ref]
Open Source Korean End-to-end Automatic Speech Recognition [Ref]
Language Translation With TorchText [Ref]
End-to-end automatic speech recognition system implemented in TensorFlow [Ref]
Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM [Ref]
Speech Recognition with Deep Recurrent Neural Networks [Ref]
pretrained embedding [Ref]

Thanks to warp-rnnt, a PyTorch bindings for CUDA-Warp RNN-Transducer. Note that it is better installed from source code.
Thanks to warp-transducer, a more general implementation of RNN transducer. Carefully set the environment variables as refered here before run python setup.py install .