Spectral Autoencoder

This is a Keras implementation of a variational version of the baseline spectral autoencoder described in google deepmind paper .

You can have a look at our results here:

Audio generation with VAE: https://www.youtube.com/watch?v=I7eWJuqg3zU

Dataset

We used a subset of the public Nsynth dataset composed by brasses and flutes. We got the log-magnitude spectra of each audio and we used them as input/target during training process. As mentioned in the original article we used Griffin & Lim algorithm to reconstruct the phase of each signal.

Implementation

We implemented a variational version of the baseline autoencoder to see if a meaningful audio generation was possible in this case. In order to reduce the huge number of parameters of the original model we achieved a dimensionality reduction of the filters with respect of it. Even in this case the phase was reconstructed using Griffin & Lim algorithm

License

The code is released under the terms of MIT license.

kinik93/Audio-generation-with-VAE

Spectral Autoencoder

Dataset

Implementation

License