General information

Current repository contains complete Chords Recognition system. System takes audiofile as input and returns timings for each chord according to Mirex categories (See example bellow). System implementation is based on Recurrent neural net with LSTM cell.
output example
In addition, repo contains realtime demo and simple web-app, which allows user to apload song and get chords for it. Demo video of realtime chords recognition is avaliable here
Project implemented as a part of Master thesis, full text avaliable here.

Datasets

Structure of repo:

  • data
    • audio - default folder for raw audio files
    • converted - default folder for the saving preprocessed data in csv format
    • gt - contains .lab files with chords
    • tracklists - contains lists of paths to audio files starting with 'audio_root' parameter
    • predicted - contains predictions for TheBeatles180 and JayChou29 datasets, reports generated by MusOOEvaluator and pretrained models

Installation

Download raw and converted datasets

bash ./data/download.sh

Preprocessing

System computes notegramms (252 bins per sample) as described in Mauch 2010 (p.98) with hop_length=512, sample_rate=11025, window_size=4096

How to use:

python preprocess.py --songs_list data/tracklists/TheBeatles180List

Parameters:

--songs_list
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determines folder where converted datasets will be saved
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--use_librosa, default: True, by default librosa.cqt will be used for preprocessing
--songs_list, required, examples could be found in data/tracklists
--num_bins, default: 84, defines number of bins during cqt
--modulation_steps, list, default: [0], defines amount of modulation steps during CQT-preprocessing

Models:

LSTM

How to use:

python train_rnn.py --model LSTM --conv_list TheBeatles180List_converted_librosa.txt

Optional parameters:

--num_epochs, default: 2
--learning_rate, default: 0.01
--weight_decay, default: 1e-5
--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--hidden_dim, default: 200
--num_layers, default: 2
--batch_size, default: 4
--sch_step_size, default:100, scheduler step size
--sch_gamma, default:100, scheduler's gamma
--, default:10, test model on train and val datasets every n iterations
--use_librosa, default: True
--save_model_as, if specified, model will be saved in pretrained folder

Test

python test_nn.py --model pretrained/LSTM_MirexRoot_TheBeatles180_librosa.pkl --conv_root data/converted/librosa --conv_list TheBeatles180List_converted_librosa.txt

Results

Isophonic 2009
Model MirexRoot MirexMajMin MirexMajMinBass MirexSevenths MirexSeventhsBass
BiLSTM 95.58% 94.59% 94.27% 90.86% 90.6%
BiLSTM with modulation 95.31% 93.43% 94.98% 91.67% 91.25%
BiGRU with modulation 93.69% 91.30% 90.75% 89.46% 87.43%
JayChou
Model MirexRoot MirexMajMin MirexMajMinBass MirexSevenths MirexSeventhsBass
BiLSTM 67.7% 61.42% 59% 41% 39.33%
BiLSTM with modulation 72.04% 69.58% 66.09% 48.97% 44.98%
BiGRU with modulation 68.59% 65.27% 62.59% 45.08% 40.37%

Random forest


Accuracy on test-set: Mirex_Root:55%

How to use:

Train

python train_rf.py
Optional parameters:

--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--criterion, default: entropy
--max_features, default: log2
--n_estimators, default: 1

Test

python test_rf.py --model pretrained/RF_MirexRoot_TheBeatles180_librosa.pkl --conv_root data/converted/librosa --conv_list TheBeatles180List_converted_librosa.txt
Optional parameters:

--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len

Applications

Realtime demo

python realtime.py --category MirexRoot

Web-application for chords recognition

python web_app\server.py