Movie Recomender System
Description
This project aims to build a movie recommender system with cleaned Netflix Prize data. The data is cleaned to the format looks like "userId,movieId,rating".
Guide
step 1. choose an algorithm - itemCF
We use itemCF because the number of users weighs more than movies. In the meanwhile, movies will not change frequently which helps lower computation. Last but not least, using user's historical data will be more convincing.
step 2. describe the relationship between movies - co-occurrence matrix
We use rating history to define relationship between movies. If a user has rated two movies, we consider that these two movies are related. Then we build a co-occurrence matrix to represent the relationship between different movies, with the format looks like "movieA:movieB relationship".
Finally, we normalize the co-occurrence matrix to make the result more accurate and transpose the matrix for computing with map reduce to the format looks like "movieB movieA=realtionship".
step 3. build a rating matrix group by user
With the format "userId movieA=rating,movieB=rating,movieC=rating,..."
step 4. multiply co-occurrence matrix and rating matrix
With the format "userId:movieId multiplyUnitResult"
step 5. sum up and compare
Then we sum up the result of multiplication grouped by user and movie and get a predicted rating to each movie by each user with the format looks like "userId:movieId predicted_rating"
We compare the predicted rating to the historical rating and find a problem. Let's take user_1's rating for example. We can find that the difference between movie_10001 and movie_10002 rated by user_1 varies from the predicted data to the historical data. Why and how to deal with it?
To be continued...