This project is about classifying the genre of a song by using machine learning approach. The neural network used in this code is made up of CNN and LSTM. When compared this model with CNN-GRU, the CNN-LSTM approach performed better than the traditional CNN-GRU approach. The testing has been done on GTZAN dataset.
For this project I have used the GTZAN dataset. This dataset has 1000 audio track and each is 30 sec long. This dataset consists of 10 genres. Download GTZAN here. It has the following genres:
- blues
- classical
- country
- disco
- hiphop
- jazz
- metal
- pop
- reggae
- rock
- Python3
- Keras (running tensorflow in the backend)
First I take each song from each genre one by one. To make a training set from audio files I convert audio files to their mel-spectograms. Mel-spectogram of an audio file may look like this:
I divided my dataset into three parts:
dataset = training set + test set + valid set
After converting to mel-spectogram this result is fed into the neural network structure of CNN-LSTM. The structure output is like below:
Model: "sequential_115"
Layer (type) Output Shape Param #
conv2d_223 (Conv2D) (None, 60, 169, 20) 520
max_pooling2d_109 (MaxPoolin (None, 30, 84, 20) 0
conv2d_224 (Conv2D) (None, 26, 80, 50) 25050
max_pooling2d_110 (MaxPoolin (None, 13, 40, 50) 0
flatten_103 (Flatten) (None, 26000) 0
dense_127 (Dense) (None, 20) 520020
lambda_50 (Lambda) (None, 20, 1) 0
lstm_101 (LSTM) (None, 512) 1052672
dense_128 (Dense) (None, 10) 5130
Total params: 1,603,392
Trainable params: 1,603,392
Non-trainable params: 0
- librosa -> details here.
- csv
- pandas
- numpy
CNN-GRU accuracy = 50.30%, and
CNN-LSTM accuracy = ~61%
The CNN-LSTM VS CNN-GRU plot is like below:
- Recommending music on Spotify with deep learning
- K. Choi, G. Fazekas, K. Cho, and M. Sandler, “A tutorial on deep learning for music information retrieval,” arXiv preprint arXiv:1709.04396, 2017.
- Music Genre Recognition by Deep Sound
- Using CNN and RNN for genre recognition by Medium
- K. Choi, G. Fazekas, M. Sandler, and K. Cho, “Convolutional recurrent neural networks for music classification,” in Proc. Int. Conf. Acoust, Speech, Signal Process., 2017
- Librosa on github -