This repository takes AssemblyAI's end-to-end ASR tutorial by Michael Nguyen as a starting point and converts the training code to be compatible with the latest PyTorch Lightning release (2.0.2 as of May 2023), mainly relying on the official PyTorch Lightning preparation guide and the DNN Beamformer example by Zhaoheng Ni in torchaudio as a template.
The motivation for this repository is to have an easily-understandable test harness/template for future ASR experiments, hence the choice to use a relatively small/simple model such as the (adapted) Deep Speech 2 implementation found in the AssemblyAI tutorial as a starting point.
-
Verified that adapted code in PyTorch Lightning behaves as original pure PyTorch code (reduced
max_epochs
in runs 2-5 down to 30 for faster training time).seed epochs WER (PyTorch Lightning code) WER (AssemblyAI code) 1 100 0.2897 0.2922 2 30 0.3373 0.3421 3 30 0.3431 0.3416 4 30 0.3408 0.3445 5 30 0.3464 0.3452