Current repository contains complete Chords Recognition system. System takes audiofile as input and returns timings for each chord according to Mirex categories (See example bellow). System implementation is based on Recurrent neural net with LSTM cell.
In addition, repo contains realtime demo and simple web-app, which allows user to apload song and get chords for it.
Demo video of realtime chords recognition is avaliable here
Project implemented as a part of Master thesis, full text avaliable here.
- TheBeatles180 Isophonics dataset
- audio - extract in data/audio
- gt - extract in data
- converted - extract in data/converted
- converted with librosa
- converted_list - extract to root folder
- CaroleKingQueen Isophonic dataset
- USPop2002
- JayChou29
- Full datasets
- data
- audio - default folder for raw audio files
- converted - default folder for the saving preprocessed data in csv format
- gt - contains .lab files with chords
- tracklists - contains lists of paths to audio files starting with 'audio_root' parameter
- predicted - contains predictions for TheBeatles180 and JayChou29 datasets, reports generated by MusOOEvaluator and pretrained models
Download raw and converted datasets
bash ./data/download.sh
System computes notegramms (252 bins per sample) as described in Mauch 2010 (p.98) with hop_length=512, sample_rate=11025, window_size=4096
python preprocess.py --songs_list data/tracklists/TheBeatles180List
--songs_list
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determines folder where converted datasets will be saved
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--use_librosa, default: True, by default librosa.cqt will be used for preprocessing
--songs_list, required, examples could be found in data/tracklists
--num_bins, default: 84, defines number of bins during cqt
--modulation_steps, list, default: [0], defines amount of modulation steps during CQT-preprocessing
python train_rnn.py --model LSTM --conv_list TheBeatles180List_converted_librosa.txt
--num_epochs, default: 2
--learning_rate, default: 0.01
--weight_decay, default: 1e-5
--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--hidden_dim, default: 200
--num_layers, default: 2
--batch_size, default: 4
--sch_step_size, default:100, scheduler step size
--sch_gamma, default:100, scheduler's gamma
--, default:10, test model on train and val datasets every n iterations
--use_librosa, default: True
--save_model_as, if specified, model will be saved in pretrained folder
python test_nn.py --model pretrained/LSTM_MirexRoot_TheBeatles180_librosa.pkl --conv_root data/converted/librosa --conv_list TheBeatles180List_converted_librosa.txt
Model | MirexRoot | MirexMajMin | MirexMajMinBass | MirexSevenths | MirexSeventhsBass |
---|---|---|---|---|---|
BiLSTM | 95.58% | 94.59% | 94.27% | 90.86% | 90.6% |
BiLSTM with modulation | 95.31% | 93.43% | 94.98% | 91.67% | 91.25% |
BiGRU with modulation | 93.69% | 91.30% | 90.75% | 89.46% | 87.43% |
Model | MirexRoot | MirexMajMin | MirexMajMinBass | MirexSevenths | MirexSeventhsBass |
---|---|---|---|---|---|
BiLSTM | 67.7% | 61.42% | 59% | 41% | 39.33% |
BiLSTM with modulation | 72.04% | 69.58% | 66.09% | 48.97% | 44.98% |
BiGRU with modulation | 68.59% | 65.27% | 62.59% | 45.08% | 40.37% |
Accuracy on test-set: Mirex_Root:55%
python train_rf.py
--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
--criterion, default: entropy
--max_features, default: log2
--n_estimators, default: 1
python test_rf.py --model pretrained/RF_MirexRoot_TheBeatles180_librosa.pkl --conv_root data/converted/librosa --conv_list TheBeatles180List_converted_librosa.txt
--songs_list, default: data/tracklists/TheBeatles180List
--audio_root, default: data/audio/
--gt_root, default: data/gt/
--conv_root, default: data/converted/, determins folder where converted datasets will be saved
--conv_list, if specified, converted audio from list will be used for fitting model and convertation process will be skipped
--category, default: MirexRoot
--subsong_len, default: 40, length of song part in seconds to be splitted during preprocess
--song_len, default: 180 if subsong_len is not specified, song will be cutted or zeropaded to song_len
python realtime.py --category MirexRoot
python web_app\server.py