This project aims to develop and deploy an advanced recommender system. It involves data gathering, algorithm selection and optimization, performance evaluation, and deployment on AWS with Hadoop and Spark integration. The goal is to improve user experience and increase sales by providing personalized product recommendations.
- Installation
- Dataset
- Content-Based Filtering
- Collaborative Filtering
- Hybrid Recommendation System
- Evaluation
- Usage
- Results
- How It Works
-
Clone the repository:
git clone https://github.com/saurabh4269/recommender_system.git cd recommender_system
-
Install the required packages:
pip install -r requirements.txt
- Movies Dataset: Contains movie metadata such as movie titles and genres.
- Ratings Dataset: Contains user ratings for various movies.
Download the dataset from MovieLens and extract the files into the project directory.
Implemented a content-based recommendation system using TfidfVectorizer
and cosine_similarity
from sklearn
.
- Data Preparation: Load the movies dataset and preprocess the genres column by filling missing values.
- TF-IDF Vectorization: Convert the genres into a TF-IDF matrix, which quantifies the importance of each genre in each movie.
- Dimensionality Reduction: Apply Truncated SVD to reduce the dimensionality of the TF-IDF matrix for more efficient similarity calculations.
- Cosine Similarity: Compute the cosine similarity between movies based on their reduced TF-IDF vectors.
- Recommendation Function: Define a function that takes a movie title as input and returns the top 10 most similar movies.
# Example usage
print(get_recommendations('Toy Story (1995)'))
Implemented a collaborative filtering recommendation system using Surprise
library and SVD algorithm.
- Data Preparation: Load the ratings dataset and prepare it for the Surprise library by specifying the rating scale.
- Train-Test Split: Split the data into training and testing sets.
- SVD Algorithm: Use the SVD (Singular Value Decomposition) algorithm to factorize the user-item interaction matrix.
- Model Training: Train the SVD model on the training set.
- Recommendation Function: Define a function that takes a user ID as input and returns the top 10 movie recommendations for that user.
# Example usage
print(get_collaborative_recommendations(1))
# Evaluate the model
predictions = algo.test(testset)
rmse = accuracy.rmse(predictions)
mae = accuracy.mae(predictions)
print(f"Collaborative Filtering RMSE: {rmse}")
print(f"Collaborative Filtering MAE: {mae}")
Combined content-based and collaborative filtering recommendations to create a more robust system.
- Combine Recommendations: Get recommendations from both the content-based and collaborative filtering systems.
- Merge Results: Combine the results from both systems while removing duplicates.
- Final Recommendations: Return the top 10 combined recommendations.
# Example usage
print(hybrid_recommendations('Toy Story (1995)', 1))
Evaluated the models using RMSE, MAE for collaborative filtering, and precision and recall for overall performance.
- Collaborative Filtering Evaluation: Calculate RMSE and MAE using the predictions from the collaborative filtering model.
- Precision and Recall: Define a function to calculate precision and recall based on ground truth and predicted recommendations.
# Evaluation metrics
precision, recall = evaluate_recommendations(ground_truth_recommendations, predicted_recommendations)
print(f"Precision: {precision}")
print(f"Recall: {recall}")
-
Run Content-Based Filtering:
Open the notebook and run the cells for Content-Based Filtering
-
Run Collaborative Filtering:
Open the notebook and run the cells for Collaborative Filtering
-
Run Hybrid Recommendation System:
Open the notebook and run the cells for Hybrid Recommendation System
The hybrid recommendation system successfully combines the strengths of content-based and collaborative filtering approaches, providing accurate and diverse recommendations.
Sample Recommendations for "Toy Story (1995)":
- Antz (1998)
- Toy Story 2 (1999)
- Adventures of Rocky and Bullwinkle, The (2000)
- Emperor's New Groove, The (2000)
- Monsters, Inc. (2001)
- DuckTales: The Movie - Treasure of the Lost Lamp (1990)
- Wild, The (2006)
- Shrek the Third (2007)
- Tale of Despereaux, The (2008)
- Asterix and the Vikings (Astérix et les Vikings) (2006)
Collaborative Filtering Evaluation:
- RMSE: 0.7779
- MAE: 0.5868
Evaluation Metrics:
- Precision: 0.375
- Recall: 0.375
- TF-IDF Vectorizer: Converts the text data (genres) into numerical features.
- Cosine Similarity: Measures the similarity between movies based on their genre features.
- Recommendation: Finds movies with the highest similarity scores to a given movie.
- SVD Algorithm: Decomposes the user-item interaction matrix into latent factors.
- Prediction: Predicts user ratings for unseen movies based on learned latent factors.
- Recommendation: Recommends movies with the highest predicted ratings for a given user.
- Combination: Merges recommendations from both content-based and collaborative filtering systems.
- Deduplication: Ensures no duplicates in the final recommendation list.
- Final Output: Provides a diverse set of recommendations leveraging both systems.