Music Genre Classification

Video demonstration of live music genre classification: https://www.youtube.com/watch?v=OqOR4L5_XtM&feature=youtu.be&ab_channel=Engineer

About The Project

This project aims to use several models to classify musical genres based on audio samples and different visualisation techniques to understand the data.

This project is inspired by the code on https://data-flair.training/blogs/python-project-music-genre-classification/, which implements a K-Nearest Neighbour approach to the problem. That served as a starting point to this project.

Dataset: http://marsyas.info/downloads/datasets.html

Notebooks

1 - Feature Extraction

Extracted Mel Frequency Cepstral Coefficients (MFCCs) from audio samples. Includes K-Nearest Neighbours approach to classifying genres (from https://data-flair.training/blogs/python-project-music-genre-classification/). Compared accuracy of models with different K values.

2 - Visualising MFCCs

Visualised Mel Frequency Cepstal Coefficients using colormaps to better understand the data and gain a more intuitive perspective on MFCCs. Compared MFCCs for different genres.

3 - Models using MFCC Covariance Matrix

Converted MFCC mean and covariance matrix features into Pandas dataframe. Trained a logistic regression model to classify music genres using these features. Tuned model to prevent overfitting by increasing the strength of regularisation and randomising the data. Explored the impact of using PCA to reduce the number of features.

4 - Models using Mel Spectrogram (including Convolutional Neural Network)

Used Librosa to extract Mel spectrogram from audio samples. This unstructured data works better for deep learning models like Convolutional Neural Networks. CNN model has 90% accuracy on test data. At the end of the notebook, one can listen to the audio sample and then see what the actual genre of the music is, as well as a barplot of the model's scores for each genre for the sample.

5 - Predicting music genre live

Uses microphone input to take 3 second audio samples and classify them with the CNN model to predict the music genre. A matplotlib plot shows the scores the model gives for each of the genres based on the classification of the 3 second sample. This plot is updated in real-time as classifications are made (every 3 seconds). The code can also be run through the python file: genre_classify_mic.py

It is actually very interesting to play a song into the mic and see how the model predicts the genre of different 3 second segments. This often results in a variety of classifications depending on the instruments and sounds. When there is silence and violin sounds it has a high score for the Classical genre, while when drums are being played it thinks it is Rock music. This high variability comes from the fact that it classifies based on 3 second segments. If more reliable predictions are needed, the entire song could be classified to have one prediction of it's genre (although this would need a CNN model trained on 30 second samples to have the correct input shape OR the logistic regression model using MFCCs or other features could be used).

alexpondaven/Music-Genre-Classification