A complete machine learning pipeline from scraping music data to modelling a music genre classifier and deployting it.
The deployed web app is live at https://jihun-kpop-western-classifier.herokuapp.com/ .
Data Collection (Spotify API for Music Features and Genius API for Song Lyrics)
Data Cleaning (Treating Null Values Inside the Dataset, etc.)
Exploratory Data Analysis
Machine Learning Modelling and Testing (Classification Model)
Model Deployment
Technologies Used
Data Collection & Cleaning
- Song Features
- Spotipy API (For Collecting Music Features)
- Pandas (Inserting the data into a dataframe, Dropping Duplicate Rows and Exporting the data as CSV)
- Song Lyrics
- Requests (Send request to Genius API for song info)
- Googlesearch (If requests is unable to get the song info, a google search will be performed to find the link of the music in Genius.)
- BeautifulSoup (With the URLs returned from either Requests or Googlesearch, BS will scrape the lyrics of the song.)
- Pandas (Used to insert the lyrics into a dataframe)
Exploratory Data Analysis
- Pandas (Analyzing, Filtering, Summary)
- Matplotlib, Seaborn (Visualization)
- Statsmodels (ANOVA)
- Scipy (Mannwhitneyu)
- Itertools (For different Group Combinations)
Machine Learning Modelling
- Sci-Kit Learn
- XGBoost
- Light GBM
Model Deployment
- Streamlit