sound-recognition

A template for your own music recognition machine learning project.

Overview

This repository is a template for a machine learning project for multi-class classification of music files.

It is implemented so that you can try out various experimental conditions (model types, dataset creation, etc.) by rewriting the configuration file.

In addition, MLFlow is used to manage the experimental results.

However, this repository has only basic functionality and will be expanded in the future.

The approach currently being implemented is as follows.

Clipping any time interval from a music file.
Perform augmentation on the signal data.
- Time Stretch
- Additional White Noise
- Pitch Shift
- Change Volume
From the signal data, a Mel-Spectrogram made of three different parameters is generated as an image.
The study is performed as an image classification problem using ResNet and other models.

$conda create -n {env_name} python=3.7.2
$activate {env_name}
$pip install -r requirements.txt

It is possible to build an environment using nvidia-docker in wsl2.

Check Dockerfile.

Create a dataset CSV

file_path,label_code
/path/to/example.wav,0
/path/to/example2.wav,1
...

Start train script

$python train.py settings.yaml {experiment name}