Three Models for Musical Genre Classification are discussed, one CNN model and two CRNN models. This project is based on Pytorch. Project report here.
- We use enhanced GTZAN Dataset, which contains 72 full songs and 1000 30s audio tracks.
- There are 10 genres, they are:
{0: 'pop',
1: 'metal',
2: 'disco',
3: 'blues',
4: 'reggae',
5: 'classical',
6: 'rock',
7: 'hiphop',
8: 'country',
9: 'jazz'}
We offers three pre-processed datasets, you can also generate datasets using Build Dataset Handmade.ipynb or Build Dataset.ipynb. Download Here
- Pure GTZAN Dataset (128^2 Chunks, 7000 in total)
- Mixed DatasetI (128^2 Chunks, 12370 in total)
- Mixed DatasetII (256^2 Chunks, 4533 in total)
- Define Parameters in Paras.py
- Use train.py for training
- Training Logs saved in log fold (loss/accuracy vs epoch on train set and validation set)
- Use music_dealer.py to predict the genre components of full song, see genre_predictor.ipynb and music_dealer.py for details
- Test result saved in log fold
- Accuracy
CNN Model | CRNN-I Model | CRNN-II Model | |
---|---|---|---|
Test Set | 88.05% | 85.08% | 88.45% |
Validation Set | 86.89% | 83.05% | 82.67% |
- Confusion Matrix
30 songs are used for test, Samples: