Wavenet: A generative model for raw audio

Question

Opened this issue 5 years ago · 0 comments

Paper

a deep generative model of audio data that operates directly at the waveform level. WaveNets are autoregressive and combine causal filters with dilated convolutions to allow their receptive fields to grow exponentially with depth, which is important to model the long-range temporal dependencies in audio signals.
promising results when applied to music audio modeling and speech recognition

main ingredient of WaveNet are causal convolutions. Because models with causal convolutions do not have recurrent connections, they are typically faster
to train than RNNs, especially when applied to very long sequences.
to deal with long-range temporal dependencies needed for raw audio generation,
we develop new architectures based on dilated causal convolutions, which exhibit very
large receptive fields
use the same gated activation unit as used in the gated PixelCNN
used residual and parameterised skip connections are used throughout the network,
to speed up convergence and enable training of much deeper models