/Audio-generation-with-VAE

Keras implementation of a variational version of baseline spectral autoencoder.

Primary LanguagePythonMIT LicenseMIT

Spectral Autoencoder

This is a Keras implementation of a variational version of the baseline spectral autoencoder described in google deepmind paper .

You can have a look at our results here:

  • Audio generation with VAE: https://www.youtube.com/watch?v=I7eWJuqg3zU

    Dataset

    We used a subset of the public Nsynth dataset composed by brasses and flutes. We got the log-magnitude spectra of each audio and we used them as input/target during training process. As mentioned in the original article we used Griffin & Lim algorithm to reconstruct the phase of each signal.

    Implementation

    We implemented a variational version of the baseline autoencoder to see if a meaningful audio generation was possible in this case. In order to reduce the huge number of parameters of the original model we achieved a dimensionality reduction of the filters with respect of it. Even in this case the phase was reconstructed using Griffin & Lim algorithm

    License

    The code is released under the terms of MIT license.