Please refer to the python notebook and project presentation to get detailed and summarized version of the project respectively
With the advent of technology, societies have become more efficient with their lives. At the same time, however, individual human lives have also become more fast-paced and distracted, leaving little time to explore artistic pursuits. Also, technology has made significant advancements in the ability to coexist with art and general entertainment. It has in fact made it easier for humans with a shortage of time to find and consume good content.
Almost every internet-based company's revenue relies on the time consumers spend on it's platform. These companies need to be able to figure out what kind of content is needed in order to increase customer time spent and make their experience better. Therefore, one of the key challenges for these companies is figuring out what kind of content their customers are most likely to consume.
Also most people enjoy listening to music and the challenge of recommending music to a user is easy to understand for a non technical audience.
The objective of this project is to build a recommendation system to predict the top_n songs for a user based on the likelihood of listening to those songs.
This project showcases my ability to rapidly learn and develop machine learning solutions while laying the foundation for building a robust production-grade tool that demonstrates the complete end-to-end machine learning process. In the future, I aim to deploy an interactive tool as a demonstration for prospective employers.
The core data is the Taste Profile Subset released by the Echo Nest as part of the Million Song Dataset. There are two files in this dataset. The first file contains the details about the song id, titles, release, artist name, and the year of release. The second file contains the user id, song id, and the play count of users.
song_id - A unique id given to every song
title - Title of the song
Release - Name of the released album
Artist_name - Name of the artist
year - Year of release
user _id - A unique id given to the user
song_id - A unique id given to the song
play_count - Number of times the song was played
- It is freely available to the public.
- It is a large enough dataset for the purpose of this project.
-
Loading and understanding the data
-
Data cleaning and feature engineering, some steps taken include:
- I combined the datasets to create a final dataset for our analysis
- For easier analysis I encoded user_id and song_id columns
- I filtered the data such that the data for analysis contains users who have listened to a good count of songs
- I also filtered the data for songs that have been listened to a good number of times
-
Exploratory Data Analysis
-
I built recommendation systems using 6 different algorithms:
- Rank/Popularity - based recommendation system
- User-User similarity-based collaborative filtering
- Item-Item similarity-based collaborative filtering
- Model based collaborative filtering / Matrix factorization
- Clustering - based recommendation system
- Content based recommendation system
-
To demonstrate clustering-based recommendation systems, the surprise library was used.
-
Grid search cross-validation was used to tune hyperparameters and improve model perfomance.
-
I used RMSE, precision@k, recall@k and F_1 score to evaluate model perfomance.
-
In the future I hope to improve the performance of these models using hyperparameter tuning.