Smart-Music-Generator

We are going to explore the amplitude part of the music signal(Two components: Amplitude and Frequency) to predict the amplitude of the notes of a song based on the previous notes.

Problematic

For this particular task, we will use Recurrent Neural Networks (RNN) and especially LSTMs which work well in cases of sequential data. They make predictions based on the current input as well as the previous ones instead of treating every beat in an independant way. This issue is compatible with the music since the musical beat depends on previous beats which makes the music a sequential data and LSTM model is best suited for this task.

Getting started

You can follow these steps to reproduce the same output:

Clone the repository or download the Ipyn file and run each cell to get each output and don't forget to include your music files paths.
The repo contains the IPython Notebook for prediction task and music files (ex) as input to learn from as well as a Readme file.
Run the ipynb to see the results. (if you can, try to compute on GPU environment to speed up the calculations ... Google Colab is a very good alternative and was chosen for this task) To know more about it https://colab.research.google.com/notebooks/welcome.ipynb
The files we will feed to the model are: Numb.wav & Maid_with_Flaxon_Hair.mp3 (the wav file was too big for Github..This way you can convert it to a wav file). Another file Kalimba.mp3 is included in the repo to play with it if you want to change..

Prerequisites

Python
Pandas
matplotlib
numpy
scipy
Tensorflow
Keras
Pydub

Results

We trained our two LSTMs models by feeding themtraining data from two songs and the output generated (in a .wav format) has predicted pretty well the music but to some extent, it is kind of distorted. To overcome this issue I tried different methods by tuning hyperparameters from adding the number of epochs to increasing number of layers without any radical change in the outcome.

Then I looked to the activation function (or transfer function), (-Activation functions' are used to introduce nonlinearity to models, which allows deep learning models to learn nonlinear prediction boundaries with different results ranges) but in this case, music data has negative values and the range of ReLU is "0 to +inf" which explains that our predictions will always have positive values. This is called the dying ReLU problem and as an attempt to solve it, we have the new ReLU function named "LeakyReLu" where our predictions can be also negative since the range of this function is "-inf to +inf".

This was a great win as you can predict for the output! Distortion has disappeared. further checks: https://keras.io/activations/ https://stackoverflow.com/questions/46620286/artificial-neural-network-relu-activation-function-and-gradients

Output files: pred_song.wav (the first predicted song) and pred_song_updated.wav (the predicted song after retraining). The original.wav is the original song.

These plots give an idea about their forms similarity as well as music signal data (amplitude and beats frequency)... but be aware of the delay between them caused during the training... The output music is still more representative !

Acknowledgments