Recommender System - A Survey

Xinsong Li, Dec 2020

[Public Repo. Unfinished article]


This survey provided a comprehensive summary about the recommender system state-of-art knowledge.

Recommender Systems usually classified into 2 types by

  • Content Based Filtering
  • Collabrative Filtering

Depends on if the model is learned from underlying data, there are 2 types,

  • Model Based
  • Memory Based

Use Cases

  • Movie Recommendation i.e., Netflix

  • Music Recommendation i.e.,, Pandora Radio

  • Product Recommendation i.e., Amazon

  • News Recommendation i.e., Google News, Toutiao

  • People Recommendation i.e., LinkedIn

System Architecture


Cold Start Problem

Collaborative Filtering

Collabrative Filtering is best suited to problem with known data on users, but lack of data for items or lack of feature extraction for items of interest. [6]

Collabrative Filtering approaches build a model from user's past behavior (items previously purchased or selectec/rated) as well as similar decision made by others. [10]

Content-Based Filtering

Knowledge-Based System










Feedback System

Hybrid Approach

Deep Learning Approach

Neural Collabrative Filtering

[30] [31] He et al. proposed a Neural Collbarative Filtering algorithm by replacing the inner product with a neural architecture that can learn an arbitrary function from data. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modeling with non-linearities, they propose to leverage a multi-layer perceptron to learn the user-item interaction function.

xTreme Deep Factorization Machines (xDeepFM)

Reinforcement Learning for Recommender Systems

Evaluation Metrics

Rating Metrics

  • Root Mean Square Error (RMSE) Measure of average error in predicted ratings

  • R Square (R^2) Essentially how much of the total variation is explained by the model

  • Mean Absolute Error (MAE)

  • Explained Variance - How much of the variance in the data is explained by the model

Ranking Metrics

  • Precision The proportion of recommended items that are relevant

  • Recall Measures the proportion of relevant items that are recommended

  • Normalized Discounted Cumulative Gain (NDCG) Evaluates how well the predicted items for a user are ranked based on the relevance

  • Mean Average Precision (MAP) Average precision for each user normalized over all users

Classification Metrics

  • Area Under Curve (AUC) Integral area under the receiver operating characteristic curve

  • Logistic Loss (Logloss) The negative log-likelihood of the true labels given the prediction of a classifier

Model Selection and Optimization


The number of items sold on majority e-commerce sites is extremely large. The most active users will only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings. [10]

Churn Prevention

Industry Practice

Pandora Radio





[3] Xavier Amatriain and Justin Basilico. System Architectures for Personalization and Recommendation (by Netflix Technology Blog)



