This project aims to provide high quality text dataset of rap music lyrics. Such dataset are then fed to a neural network to build lyrics-generation model. The resulting word-to-word lyrics-generative model is served on
Feel free to tweak this scraper to fit your needs. Kudos to open source.
First you will need to create a genius API key to be able to call their API. Once done, copy your
. -
Get the repo - clone from GitHub
$ git clone
Setup a virtualenv
This project is built on python3 - I recommend using a virtual environment.
`which python3` -m venv RapLyrics-Scraper
source RapLyrics-Scraper/bin/activate
pip install -r requirements.txt
Update the list of artists you want to get the lyrics from and the number of songs to get per artists. To do so, directly edit the
list defined
. -
To run the script: be sure to set the
arguments.- Specify the directory in which the scraped lyrics should be saved with
- Specify the number of songs to scrap per artist with the
arg. Runpython --help
for more information on the available arguments
- Specify the directory in which the scraped lyrics should be saved with
Let's say you want to scrap 2 songs per artist and save them in the folder my_lyrics_folder
with a verbose output, run:
python --verbose --lyrics_dir='my_lyrics_folder' --songs_per_artists=2
- Once the scraping is done : one lyric file is generated per artist scraped. Merge the files with:
cat *_lyrics.txt > merged_lyrics.txt
A toolbox is also provided to analyze some of the dataset properties.
To run a quick analysis of any .txt
file, update the file to consider in pre_processing/
then run:
python pre_processing/
Currently we get the songs by decreasing popularity order.
This project was intensively used to generate high quality text dataset that were consumed by:
RapLyrics-Back, to train and serve a lyrics-generative model.
RapLyrics-Front consumes the model trained and served by RapLyrics-Back enabling users to generate unique and inspirational lyrics.