jinglescode/papers

Wavenet: A generative model for raw audio

Opened this issue · 0 comments

Paper

Link: https://arxiv.org/pdf/1609.03499.pdf
Year: 2016

Summary

  • a deep generative model of audio data that operates directly at the waveform level. WaveNets are autoregressive and combine causal filters with dilated convolutions to allow their receptive fields to grow exponentially with depth, which is important to model the long-range temporal dependencies in audio signals.
  • promising results when applied to music audio modeling and speech recognition

Methods

  • main ingredient of WaveNet are causal convolutions. Because models with causal convolutions do not have recurrent connections, they are typically faster
    to train than RNNs, especially when applied to very long sequences.
  • to deal with long-range temporal dependencies needed for raw audio generation,
    we develop new architectures based on dilated causal convolutions, which exhibit very
    large receptive fields
  • use the same gated activation unit as used in the gated PixelCNN
  • used residual and parameterised skip connections are used throughout the network,
    to speed up convergence and enable training of much deeper models