Own music recommendation system and singers' embeddings analysis

The project was completed in September 2019.

I wanted to understand if it is possible to create a good music recommendation system, which would use only the statistics of song lyrics, without using NLP and the sound waves themselves. To do that, I decided to create a simple few layers neural network and find the data to train it.
Unfortunately, I did not find a suitable dataset, so I collected mine and it was the largest open dataset with a variety of metadata (Sep 2019). Then it was decided to open-source the dataset on kaggle. Also, every related work and analysis will also be there.

Dataset consists of:

songs_dataset.csv contains 253k+ songs (different genres, decades and so on) with 10 features:
|Singer|Album|Song|Date|Featuring|Genre|Lyrics|Tags|Producers|Writers|;
parts_dataset.csv contains songs with lyrics split into parts (verse, chorus, hook, etc.).
It's not that obvious due to the dirty (real-world) data.

Currently available notebooks:

Data analysis and Plotly visualizations can be found here;
Creation of the simple dense NN and further exploration of received singers' embeddings can be found here.

UPD: dataset on kaggle was deleted, so now you can get it via Google Drive.

Thanks to Ilya Liyasov for helping develop the songs parser.

detkov/Own-Music-Recommendation-System

Own music recommendation system and singers' embeddings analysis