This project attempted to perform a genre classification task on album cover images. Data was gathered with the last.fm API.
To reproduce the results:
$ pip install -r requirements.txt
- Get the dataset
- For training from scratch:
from_scratch.py
- for finetuning:
fine_tuning.py
Hyperparam search is not included because it was done manually and not very thoroughly.
Generating the dataset:
- Get an api key from last.fm, set environment variables as specified in
api.py
. - Download the data using
api.py
. Experiments were done with the tags listed there and 1000 albums per tag. - Run
cleanup.py
andverify_images.py
to remove duplicate filenames and invalid images. - Run
splits.py
to split into train/test data. Experiment was done with the default 0.8.
Areas for dataset improvement:
- Deduplicate images using hash or similar (image-based)
- Collect larger sampleset
- Filter incorrect albums (e.g. Plague Mass)
Generally, the data quality from the source is not great. The API is great to use though.
Labeling is highly ambiguous for many of the albums.
Areas for training improvement:
- Use less classes and more data
- Rigorous hyper-parameter search
- Pretrain on OMACIR (unlabeled data, e.g. with Autoencoder)
- Try more powerful network
Feel free to pick up this project, it was fun to work on but you can do a lot better than me.