RecommendationSystem

Background:

The data file is ratings.csv. Every record in the file is of the form user, item, rating, timestamp.

user – The user’s unique identifier

item – The item’s unique identifier

rating – The rating that was given to the item by the user, it is in the range [0.5,5]

timestamp – The timestamp in which the rating was given.

Baseline model

The first baseline model for recommender systems is

𝑟𝑢𝑖 ̂ = 𝑅̂ + 𝑏𝑢 + 𝑏𝑖 where

𝑅̂ - average of all the ratings in the user-item ratings matrix 𝑅,

𝑏𝑢 - average rating for user 𝑢 and

𝑏𝑖 - average rating for item 𝑖.

Collaborative filtering

NeighborhoodRecommender makes use of the similarities of users and the similarities of items to make predictions.

We will be using only the user similarities.

The prediction is done with the 3 nearest neighbors.

Regression model

The rating estimate for user 𝑢, item 𝑖 and timestamp 𝑡 is:

𝑟𝑢𝑖𝑡 = 𝑅̂ + 𝑏𝑢 + 𝑏𝑖 + 𝑏𝑑 + 𝑏𝑛 + 𝑏i.

where: 𝑏𝑑 – A parameter for ratings that were given in daytime (between 6am and 6pm).

𝑏𝑛 – A parameter for ratings that were given in the night (between 6pm and 6am).

𝑏𝑤 – A parameter for ratings that were given in the weekend (Friday or Saturday).

This is a least squares problem: min 𝑏𝑢,𝑏𝑖,𝑏𝑑,𝑏𝑛,𝑏𝑤 ‖𝑋𝛽 − 𝑦‖2

To solve the least squares problem, we use np.linalg.lstsq.

CompetitionRecommender

We tried to find the lowest RMSE score on the ratings_comp data (800,000 ratings), by combining ls parameters and baseline prediction.