Genre classification

Exercise 14 of the Udacity class on Machine Learning workflows. The objective of this exercise is to bring the lesson's contents together in a complete ML pipeline that produces a trained Random Forest model.

Dataset

The dataset used in this exercise is a modified version of the original songs dataset: here

Model

We use a Random Forest to perform songs' genre classification (NLP).

Results

The model was evaluated using a test set from the dataset, with an AUC (Area Under the Curve) of 0.95326. The following figures show the feature importance and the confusion matrix computed during the evaluation step:

Pipeline

You can use this pipeline at version 1.0.0 with mlflow and w&b (you need to be logged in: wandb login):

mlflow run -v 1.0.0 git@github.com:AMergy/genre_classification.git

for instance:

mlflow run https://github.com/AMergy/genre_classification.git \ 
             -v 1.0.0 \
             -P hydra_options="main.project_name=remote_execution"

jenapss/Genre_Classification

Genre classification

Dataset

Model

Results

Pipeline