/matrix-completion

matrix completion for collaborative filtering

Primary LanguagePython

collaborative-filtering

Netflix rating prediction using collaborative filtering based on matrix completion, using scipy.

Matrix Completion Overview:

The collaborative filtering problem can often be modeled as a matrix completion problem, whose goal is to fill out the unobserved values in a matrix. The goal of predicting unknown users’ movie ratings aligns well with matrix completion, we can use matrix completion methods on a user-movie matrix which holds previously given ratings to obtain unknown ratings. To realize this goal, we use matrix factorization, in particular, the SVD of a user-movie ratings matrix. The fundamental idea of the Singular Value Decomposition (SVD) is to decompose the ratings matrix into a user feature matrix, a singular value matrix, and a movie feature matrix of low-rank, essentially, it is performing biclustering of users and movies. We typically perform this decomposition after normalizing the ratings matrix and then by filling out the missing elements with preliminary simple predictions, and finally perform a transform on the matrix to bring it back to the original feature space. The below flowchart gives a detailed walkthrough of the matrix completion process.

Approach and Walkthrough

We begin by constructing a user-by-movie matrix that holds the ratings obtained from the training dataset and normalizing this ratings matrix with respect to movies, i.e. along the columns of the matrix by mean centering to construct , we save the means of the columns to be used at a later stage. We then perform the SVD operation on such that where is a diagonal matrix with descending sorted singular values deposited in its diagonal entries, and the and are matrices whose columns contain the corresponding left and right singular vectors. We then perform a low rank reconstruction of using the first columns of and truncated to top-k rank to give . Now we add back to give , our completed matrix which is used to make predictions on the unknown user-movie pairs in the validation/test datasets. Overall, we achieve a mean MSE score of 1.025 for k=25, cross validated on 10 folds. Other design choices and hyper parameter related tuning details will be listed below.

Flowchart 1: Training + Validation process

Normalizing ratings matrices: The user-by-movie ratings matrix is normalized in such a way that the mean is centered around zero. In order to achieve that, we calculate the mean for each column and subtract the non-zero elements in the column with the mean, the zero elements remain unchanged. We also performed experiments without centering mean to zero but rather to the mean of the non zero elements in the vector. This is essentially achieved by replacing zero values in a vector with the mean of the non zero values in that vector. This seems to perform slightly better (MSE 1.012) for a few preliminary experiments on various validation datasets but cross validation has not been performed for this approach.

Choice of k: We prefer a lower rank reconstruction of the matrix because it prevents overfitting and gives much generalized reconstruction, which helps in assigning some value (prediction) to the previously unknown elements in the matrix. We performed experiments while varying the value of k and found k=25 to be optimal.

We also did experiments trying to use recommender system packages like scikit-surprise for python and noticed that the lowest achieved MSE, with a probabilistic formulation of SVD which includes gradient descent, stands around 0.8486, this means that there is a huge room for improvement for my methods but due to time constraints, could not explore further.