A complete machine learning pipeline from scraping music data to modelling a music genre classifier and deployting it.
The deployed web app is live at https://jihun-kpop-western-classifier.herokuapp.com/ .
-
Data Collection (Spotify API for Music Features and Genius API for Song Lyrics)
-
Data Cleaning (Treating Null Values Inside the Dataset, etc.)
-
Exploratory Data Analysis
-
Machine Learning Modelling and Testing (Classification Model)
-
Model Deployment
-
Technologies Used
-
Data Collection & Cleaning
- Song Features
- Spotipy API (For Collecting Music Features)
- Pandas (Inserting the data into a dataframe, Dropping Duplicate Rows and Exporting the data as CSV)
- Song Lyrics
- Requests (Send request to Genius API for song info)
- Googlesearch (If requests is unable to get the song info, a google search will be performed to find the link of the music in Genius.)
- BeautifulSoup (With the URLs returned from either Requests or Googlesearch, BS will scrape the lyrics of the song.)
- Pandas (Used to insert the lyrics into a dataframe)
- Song Features
-
Exploratory Data Analysis
- Pandas (Analyzing, Filtering, Summary)
- Matplotlib, Seaborn (Visualization)
- Statsmodels (ANOVA)
- Scipy (Mannwhitneyu)
- Itertools (For different Group Combinations)
-
Machine Learning Modelling
- Sci-Kit Learn
- XGBoost
- Light GBM
-
Model Deployment
- Streamlit