jinglescode/papers

Transformers with convolutional context for ASR

jinglescode opened this issue · 0 comments

Paper

Link: https://arxiv.org/pdf/1904.11660.pdf
Year: 2020

Summary

  • replacing the sinusoidal positional embedding for transformers with convolutionally learned input representations
  • fixed learning rate of 1.0 and no warmup steps

Methods

  • 2 parts
    • learning local relationships within a small context with convolutional layers
    • learning global sequential structure of the input with transformer layers
  • use conv to learn an acoustic language model over the bag of discovered acoustic units as it goes deeper in the encoder

code: github.com/pytorch/fairseq/tree/master/examples/speech recognition