
Table of Contents

About The Project

Develop a model that can accurately classify tracks to a genre in order to improve song recommendation systems.

Data Set

Dataset used for the project can be found in the below link:

FMA: A Dataset For Music Analysis Data Set

The dataset we used is “FMA: A Dataset For Music Analysis Data Set” from the UCI Machine Learning Repository. The dataset contains 106,574 music tracks (instances), and 518 features. The set of features for each instance include things like ID, title, artist, genres, tags, and play counts.


  • Sklearn
  • Pandas, Numpy


cd Music-Genre-Classification
pip install -r requirements.txt
jupyter notebook

Order to run python notebooks in:

  1. Preprocessing_Part1 - initial preprocessing of data from website
  2. Data Augmentation - data augmentation conducted on initially preprocessed data to add ~110,000 instances
    • Approximately 10 hours
  3. Preprocessing_Part2 - final preprocessing of the initial data and the augmented data
  4. Models_and_Analysis_milestone - training and evaluation of models on a subset of the dataset
  5. Models and Analysis - modeling and analysis on the full dataset

Training models takes significant amount of time


Argyro Major

Divya Gaddipati

Conrad Pereira


