/Music-Genre-Classification

creating a machine learning model which classifies music samples into different genres.

Primary LanguageJupyter Notebook

Music Genre Classification

  • Aim is to create a machine learning model which classifies music samples into different genres.
  • It aims to predict the genre by using an audio signal as its input.
  • Automating the classification of music helps to make the selection of songs quick and less time-consuming.
  • It can also be used to find valuable data such as finding out popular genres, trends, and artists very easily.

🌐 About Dataset?

  • We shall be using the GTZAN music classification dataset, which consists of audio files of 30 seconds each with each audio track being in .wav format.
  • The dataset contains 10 genres of music:
  • Classical, Jazz, Country, Rock, Blues, Reggae, Metal, Disco, Hip-hop, Pop
  • GTZAN Dataset

🔗 Detail Project Documentation

⚡Feature Extraction

  • In order to help us extract the features from the audio files, we have used the LibROSA library

  • Libros is a library designed for music and audio analysis and helps provides the building blocks necessary in order to create a music retrieval system.

  • Libros helps to visualize the audio signals and perform feature extraction using different signal processing techniques.

  • Signals :

    • A signal is a variation in a certain quantity over time.
  • The Fourier Transform :

    • An audio signal is comprised of several single-frequency sound waves. When taking samples of the signal over time, we only capture the resulting amplitudes.
    • It converts the signal from the time domain into the frequency domain. The result is called a spectrum.
  • Zero-Crossing Rate :

    • The zero-crossing rate is the rate of sign-changes along with a signal, i.e., the rate at which the signal changes from positive to negative or back.
  • Spectral Centroid :

    • It indicates where the ” center of mass” for a sound is located and is calculated as the weighted mean of the frequencies present in the sound. Consider two songs, one from a blues genre and the other belonging to metal.
  • Spectral RollOff :

    • It is a measure of the shape of the signal. It represents the frequency below which a specified percentage of the total spectral energy.
  • Croma Frequencies :

    • Chroma features are an interesting and powerful representation for music audio in which the entire spectrum is projected onto 12 bins representing the 12 distinct semitones (or chroma) of the musical octave.

📚 Applied Models

  1. Naive Bayes Classifier

  2. Decision Trees

  3. K Nearest Neighbors

  4. Neural Networks

    first create a function which takes the model as the input and fits it to the training dataset, which then predicts the genres of the tracks in the test dataset. It then prints the confusion matrix along with the accuracy of the model.

Model Accuracy
Naive Bayes Classifier 51.9%
Decision Trees 64.4%
K Nearest Neighbors 80.5%
Neural Networks 67.1%

Confusion matrix

It' determined that KNN has the highest accuracy, the final model is then developed based on it, and according to this model. So, I compute the feature importance.

📱 Refered Papers