/Genre_Classification

Final project on Udacity Machine Learning workflows

Primary LanguagePython

Genre classification

Exercise 14 of the Udacity class on Machine Learning workflows. The objective of this exercise is to bring the lesson's contents together in a complete ML pipeline that produces a trained Random Forest model.

Dataset

The dataset used in this exercise is a modified version of the original songs dataset: here

Model

We use a Random Forest to perform songs' genre classification (NLP).

Results

The model was evaluated using a test set from the dataset, with an AUC (Area Under the Curve) of 0.95326. The following figures show the feature importance and the confusion matrix computed during the evaluation step:

Pipeline

You can use this pipeline at version 1.0.0 with mlflow and w&b (you need to be logged in: wandb login):

mlflow run -v 1.0.0 git@github.com:AMergy/genre_classification.git

for instance:

mlflow run https://github.com/AMergy/genre_classification.git \ 
             -v 1.0.0 \
             -P hydra_options="main.project_name=remote_execution"