Wavenet: A generative model for raw audio
Opened this issue · 0 comments
jinglescode commented
Paper
Link: https://arxiv.org/pdf/1609.03499.pdf
Year: 2016
Summary
- a deep generative model of audio data that operates directly at the waveform level. WaveNets are autoregressive and combine causal filters with dilated convolutions to allow their receptive fields to grow exponentially with depth, which is important to model the long-range temporal dependencies in audio signals.
- promising results when applied to music audio modeling and speech recognition
Methods
- main ingredient of WaveNet are causal convolutions. Because models with causal convolutions do not have recurrent connections, they are typically faster
to train than RNNs, especially when applied to very long sequences. - to deal with long-range temporal dependencies needed for raw audio generation,
we develop new architectures based on dilated causal convolutions, which exhibit very
large receptive fields - use the same gated activation unit as used in the gated PixelCNN
- used residual and parameterised skip connections are used throughout the network,
to speed up convergence and enable training of much deeper models