jinglescode/papers

Transformers with convolutional context for ASR

jinglescode opened this issue 5 years ago · 0 comments

jinglescode commented 5 years ago

Paper

Link: https://arxiv.org/pdf/1904.11660.pdf
Year: 2020

Summary

replacing the sinusoidal positional embedding for transformers with convolutionally learned input representations
fixed learning rate of 1.0 and no warmup steps

Methods

2 parts
- learning local relationships within a small context with convolutional layers
- learning global sequential structure of the input with transformer layers
use conv to learn an acoustic language model over the bag of discovered acoustic units as it goes deeper in the encoder

code: github.com/pytorch/fairseq/tree/master/examples/speech recognition