/Own-Music-Recommendation-System

Multi-Class genre Classification and singers' embeddings space analysis + the largest open source music lyrics dataset

Primary LanguageJupyter Notebook

Own music recommendation system and singers' embeddings analysis

The project was completed in September 2019.

I wanted to understand if it is possible to create a good music recommendation system, which would use only the statistics of song lyrics, without using NLP and the sound waves themselves. To do that, I decided to create a simple few layers neural network and find the data to train it.
Unfortunately, I did not find a suitable dataset, so I collected mine and it was the largest open dataset with a variety of metadata (Sep 2019). Then it was decided to open-source the dataset on kaggle. Also, every related work and analysis will also be there.

Dataset consists of:

  • songs_dataset.csv contains 253k+ songs (different genres, decades and so on) with 10 features:
    |Singer|Album|Song|Date|Featuring|Genre|Lyrics|Tags|Producers|Writers|;
  • parts_dataset.csv contains songs with lyrics split into parts (verse, chorus, hook, etc.).
    It's not that obvious due to the dirty (real-world) data.

Currently available notebooks:

  • Data analysis and Plotly visualizations can be found here;
  • Creation of the simple dense NN and further exploration of received singers' embeddings can be found here.

UPD: dataset on kaggle was deleted, so now you can get it via Google Drive.

Thanks to Ilya Liyasov for helping develop the songs parser.