jinglescode/papers

Improved Noisy Student Training for Automatic Speech Recognition

jinglescode opened this issue · 0 comments

Paper

Link: https://arxiv.org/pdf/2005.09629v1.pdf
Year: 2020

Summary

  • adapt and improve noisy student training for automatic speech
    recognition (noisy student training is an iterative self-training method that leverages augmentation to improve network performance)

Methods

  • employ (adaptive) SpecAugment, an augmentation method for ASR that directly acts on the spectrogram of the input audio, for noisy student training
  • use shallow fusion with a language model on the teacher network to generate better transcripts for the student network to train on
  • propose a normalized filtering score for transcripts generated by teacher networks given as a function of the fusion score and number of tokens
  • use a variant of sub-modular sampling to weigh the utterance-transcript pairs generated by the teacher network to balance the token statistics of the dataset to be passed on to the student