ILYrics is minimal lyrics search engine based on Django as the frontend, ElasticSearch as the backend, working with an AWS postgreSQL database and deployed on heroku here.
This project is part of Tsinghua's 2021 WebIR Course taught by Prof. Min ZHANG (张敏) and inspired by Genius. In addition to the present code, the report and presentation slides are available online.
ILYrics is composed of the following:
- Django (2.2.5)
- ElasticSearch (7.12.1) hosted inside a free tier AWS instance like this one
- PostgreSQL online database with 20Go SSD storage, also hosted inside a free tier AWS instance like this one
- Heroku is used to deploy the application as a 24/7 available website
ILYrics can find songs based a song name, an artist name. Thanks to the power of ElasticSearch (based on Lucene), it can also find songs based on part of lyrics.
Feel free to experiment on https://www.ilyrics.herokuapp.com with queries like: la vie en rose or bruno mars or even she's just a girl who claims that I am the one.
As long as you have access tokens to each online component (e.g. within a credentials.json
file), all you need to do is:
-
Make sure you have git and pip installed
-
Clone this repository using
git clone https://github.com/smeelock/ilyrics
-
Install required dependencies using
pip install -r requirements.txt
-
Run the application using
python manage.py runserver
-
Navigate to localhost:8000
-
Enjoy :)
- Scoring function (currently using Lucene's bm25 but experimenting with more scoring functions can reveal more efficient methods)
- Parsing (currently not using parsing whatsoever but removing stopwords, stemming words, using pos-tagging... can greatly improve performance. Spacy would definitely be the go-to library.)
- Query understanding (e.g. using pretrained language models such as BERT or GPT-3)
- Security (e.g. with fully integrated IAM connection on AWS)