User-based Collaborative Filtering in Python (Adapted from University of Minnesota CSci 1901H Class project)
Implement a simple user-based collaborative filtering recommender system for predicting the ratings of an item using the data given. This prediction should be done using k nearest neighbors and Pearson correlation. Finally using the similarity of the k nearest neighbors, it is required to predict the ratings of the new item for the given user.
- The input file consists of one rating event per line. Each rating event is of the form: user_id\trating\tmovie_title
- The user_id is a string that contains only alphanumeric characters and hyphens and spaces (no tabs).
- The rating is one of the float values 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, and 5.0.
- The movie_title is a string that may contain space characters (to separate the words).
- The three fields -- user_id, rating, and the movie_title -- are separated by a single tab character (\t).
- pearson_correlation(user1, user2)
- This function calculates the pearson correlation between 2 users.
- Return value is a float between 1 and -1.
- For calculating the average for each user, include all the user’s ratings and not just the intersection of the 2 user’s ratings.
- However when computing summation, use only items that both users have rated.
- K_nearest_neighbors(user1, k) :
- This function calculates the k nearest neighbors of user1 based on pearson similarity.
- Returns a list of k nearest neighbors and their similarity.
- For calculating the average for each user, include all the user’s ratings and not just the intersection of the 2 user’s ratings.
- However when computing summation, use only items that both users have rated.
- When sorting similarities, if 2 users have the same similarity sort them by user id.
- Predict(user1, item, k_nearest_neighbors) :
- This function calculates the final prediction for item for user1 using k nearest neighbors.
- You will compute a simple weighted average of the ratings provided by the k nearest neighbors.
- Use only the neighbors who have rated the input item.
- Prediction = ∑(Wi,1)*(rating i,item) / ∑(Wi,1) where Wi,1 is the similarity of user i with user1 from the k nearest neighbors.
Python collabFilter.py ratings-dataset.tsv Kluver ‘The Fugitive’ 10 ratings-dataset.tsv: input file Kluver: User id Movie: The Fugitive K: 10
The program will output:
- K nearest neighbors with their user ids and similarity values separated by space as per the output file
- Rating prediction for item.