Recommender System Development

Project Overview

This project aims to develop and deploy an advanced recommender system. It involves data gathering, algorithm selection and optimization, performance evaluation, and deployment on AWS with Hadoop and Spark integration. The goal is to improve user experience and increase sales by providing personalized product recommendations.

Installation
Dataset
Content-Based Filtering
Collaborative Filtering
Hybrid Recommendation System
Evaluation
Usage
Results
How It Works

Installation

Clone the repository:

git clone https://github.com/saurabh4269/recommender_system.git
cd recommender_system

Install the required packages:
```
pip install -r requirements.txt
```

Dataset

Movies Dataset: Contains movie metadata such as movie titles and genres.
Ratings Dataset: Contains user ratings for various movies.

Download the dataset from MovieLens and extract the files into the project directory.

Content-Based Filtering

Implemented a content-based recommendation system using TfidfVectorizer and cosine_similarity from sklearn.

How It Works

Data Preparation: Load the movies dataset and preprocess the genres column by filling missing values.
TF-IDF Vectorization: Convert the genres into a TF-IDF matrix, which quantifies the importance of each genre in each movie.
Dimensionality Reduction: Apply Truncated SVD to reduce the dimensionality of the TF-IDF matrix for more efficient similarity calculations.
Cosine Similarity: Compute the cosine similarity between movies based on their reduced TF-IDF vectors.
Recommendation Function: Define a function that takes a movie title as input and returns the top 10 most similar movies.

# Example usage
print(get_recommendations('Toy Story (1995)'))

Collaborative Filtering

Implemented a collaborative filtering recommendation system using Surprise library and SVD algorithm.

How It Works

Data Preparation: Load the ratings dataset and prepare it for the Surprise library by specifying the rating scale.
Train-Test Split: Split the data into training and testing sets.
SVD Algorithm: Use the SVD (Singular Value Decomposition) algorithm to factorize the user-item interaction matrix.
Model Training: Train the SVD model on the training set.
Recommendation Function: Define a function that takes a user ID as input and returns the top 10 movie recommendations for that user.

# Example usage
print(get_collaborative_recommendations(1))

# Evaluate the model
predictions = algo.test(testset)
rmse = accuracy.rmse(predictions)
mae = accuracy.mae(predictions)

print(f"Collaborative Filtering RMSE: {rmse}")
print(f"Collaborative Filtering MAE: {mae}")

Hybrid Recommendation System

Combined content-based and collaborative filtering recommendations to create a more robust system.

How It Works

Combine Recommendations: Get recommendations from both the content-based and collaborative filtering systems.
Merge Results: Combine the results from both systems while removing duplicates.
Final Recommendations: Return the top 10 combined recommendations.

# Example usage
print(hybrid_recommendations('Toy Story (1995)', 1))

Evaluation

Evaluated the models using RMSE, MAE for collaborative filtering, and precision and recall for overall performance.

How It Works

Collaborative Filtering Evaluation: Calculate RMSE and MAE using the predictions from the collaborative filtering model.
Precision and Recall: Define a function to calculate precision and recall based on ground truth and predicted recommendations.

# Evaluation metrics
precision, recall = evaluate_recommendations(ground_truth_recommendations, predicted_recommendations)
print(f"Precision: {precision}")
print(f"Recall: {recall}")

Usage

Run Content-Based Filtering:

Open the notebook and run the cells for Content-Based Filtering
Run Collaborative Filtering:

Open the notebook and run the cells for Collaborative Filtering
Run Hybrid Recommendation System:

Open the notebook and run the cells for Hybrid Recommendation System

Results

The hybrid recommendation system successfully combines the strengths of content-based and collaborative filtering approaches, providing accurate and diverse recommendations.

Sample Recommendations for "Toy Story (1995)":

Antz (1998)
Toy Story 2 (1999)
Adventures of Rocky and Bullwinkle, The (2000)
Emperor's New Groove, The (2000)
Monsters, Inc. (2001)
DuckTales: The Movie - Treasure of the Lost Lamp (1990)
Wild, The (2006)
Shrek the Third (2007)
Tale of Despereaux, The (2008)
Asterix and the Vikings (Astérix et les Vikings) (2006)

Collaborative Filtering Evaluation:

RMSE: 0.7779
MAE: 0.5868

Evaluation Metrics:

Precision: 0.375
Recall: 0.375

How It Works

Content-Based Filtering

TF-IDF Vectorizer: Converts the text data (genres) into numerical features.
Cosine Similarity: Measures the similarity between movies based on their genre features.
Recommendation: Finds movies with the highest similarity scores to a given movie.

Collaborative Filtering

SVD Algorithm: Decomposes the user-item interaction matrix into latent factors.
Prediction: Predicts user ratings for unseen movies based on learned latent factors.
Recommendation: Recommends movies with the highest predicted ratings for a given user.

Hybrid System

Combination: Merges recommendations from both content-based and collaborative filtering systems.
Deduplication: Ensures no duplicates in the final recommendation list.
Final Output: Provides a diverse set of recommendations leveraging both systems.

saurabh4269/recommender_system

Recommender System Development

Project Overview

Contents

Installation

Dataset

Content-Based Filtering

How It Works

Collaborative Filtering

How It Works

Hybrid Recommendation System

How It Works

Evaluation

How It Works

Usage

Results

How It Works

Content-Based Filtering

Collaborative Filtering

Hybrid System