/Hybrid_Recommender_System

Uses KNN on MovieLens Dataset and Hybridization Method to suggest new movies.

Primary LanguagePythonMIT LicenseMIT

Hybrid Recommender System

Built as a part of my final year project during graduation.

Uses Movielens 100K dataset (2016 version)

Features/Methods

Collaborative filtering

  • User-based Collaborative Filtering
  • Item-based Collaborative Filtering
  • CF using Singular Value Decomposition (SVD)
  • Popularity based (implemented as sum of all ratings recieved on a particular movie)

Content Based Filtering

  • Simple Approach
  • Normalising of Category vector (The size of similarity matrix reduced from 9000x9000 to 800x800.)
  • Using Bag of Words (for movie titles)

Hybridization techniques

  • Mixed Hybridization
  • Switching
  • Feature Combining: Collaborative Via Content Based

User Interface

The focus on UI was low because focus was on algorithm.

More screenshots

Load virtual environment and dependencies

Better to use Anaconda

Creation: conda env create -f conda_environment.yml

Load Environment: source activate recommender

For those using pip

pip install -r requirements.txt

Download and extract

MovieLens Dataset.

For building database

Use MySQL. Create a empty database. Remember database name.

Running

Make sure MySQL server is running.

Run sample_recommender.py to check everything works properly.

If you are setting up for the first time you will be asked for database details.

If you want to reset run generate_defaults.py or delete defaults.json file

Also you would have to make changes to DATABASEvariable in Hybrid_Recommender_System/setting.py which Django will use.

Release Versions

v0.1-alpha - Command Line Interface

v0.2-alpha - Django Support

References

Recommender Systems Basics

SVD:

For Faster Numerical Computations in Python

NumPy Tutorial: Data analysis with Python

Numpy Cheatsheet

Pandas Tutorial: Data analysis with Python: Part 1

Pandas Tutorial: Data analysis with Python: Part 2

scipy.sparse.csr_matrix

sklearn.metrics.pairwise.cosine_similarity

Things not implemented

  1. Thoughts on Working with Larger Dataset
  2. Thoughts on working with multi criteria dataset