This directory contains the lyrics data and the code for lyrical feature modeling from our paper that have been accepted to the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022). Due to the privacy issues here we share only the partial implementation discussed in our paper.
To reuse this repo, install the requested libraries
pip install -r requirements.txt
In this directory, we have added the initial artist page names
that were liked by the LikeYouth (read more about this data here) participants on Facebook. Also, we have provided the best obtained LDA model file and the topic visualisation (.html) file.
To get access to artists' lyrics and the lyrics' content features, please download the data as a zip file
from the Lyrics_annotated_data hosted in the OSF directory. When cloning/downloading the repo, add the .csv files
into this directory to re-run the experiments we described. When using the initial artist_lyrics_initial_dt.csv
and re-running the scripts we have provided here, you should be able to reproduce the artist_lyrics_annot_vader_nrc_moralStrength_lda_final_dt.csv
we shared here.
The notebooks
directory contains the Jupyter scripts for obtaining lyrics features:
sentiment analysis with VADER, emotion associations with
NRC lexicon, moral scores with MoralStrength) lexicon and lyrics topics using LDA topic modeling.
Whereas, the py_scripts
folder contains the python code for lyrics scraping using Genius API cleaning and preprocessing and language detection using spaCy.