TensorFlow implementation of a Variational Autencoder with Inverse Autoregressive Flows for encoding spectrograms.
This is the main model I used for my NeuralFunk project.
This code was not really intended to be shared and is quite messy. I might improve it at some point in the future, but for now be aware that everything is quite hacky and badly documented.
- The preprocessing as well as the encoder architecture were heavily inspired by this iPython Notebook.
- A lot of the data-feeding code and many other bits and pieces were adapted from this Wavenet implementation.
- The Griffin-Lim algorithm was taken from the Magenta NSynth utils.
Some random experiments, as well as the creation of the dataset for the VAE can be found in Preprocessing and Experiments.ipynb.
The dataset pickle file has to be a dictionary of the form
{
'filenames' : list_of_filenames,
'melspecs' : list_of_spectrogram_arrays,
'actual_lengths' : list_of_audio_len_in_sec
}
and be stored as dataset.pkl
in the root directory.
python train.py
Based on
- Sampling from latent space:
python generate.py
- Single input file:
python generate.py --file_in filename
- Multiple input files:
python generate.py --file_in list_of_filenames
- Single file:
python encode_and_reconstruct.py --audio_file filename
- Full dataset:
python encode_and_reconstruct.py --encode_full true
python find_similar.py --target target_audio_file --sample_dirs list_of_dirs_to_search
All the above scripts have other options and uses as well, look into the code for more details.