Explore-Exploit-in-Top-N-Recommender-Systems-via-Gaussian-Processes

A simple MATLAB Implementation of CGP-RANK algorithm [1]. CGP-RANK ranks an ordered list of recommendations, and it is a well suited algorithm for web-scale tasks, where the number of items available for recommendation is too large, and the context space too complex, compared to the relatively sparse click feedback.

CGP-RANK is an upper-confidence style algorithm, which exploits prior information specified in terms of a Gaussian process kernel function, which allows to share feedback in three ways: between positions in a list, between items, and between contexts [2]. Choice of kernels expresses our prior assumption of how smoothly the rewards change over the item-context space, and allows us to predict the performance of yet unexplored lists using Gaussian Process models.

Dependencies

You need to have the following libraries.

GPML Matlab Code version 4.2

You can download it here! Don't forget to add the library with all its subfolders to your project path.

How to Run

You will need to initialize a couple things, let's walk through them one by one!

Create a Reward Function

We will pass this function's handle to CGP-RANK, reward function should take as arguments a list of items and contexts, they can be multidimensional, as long as your kernels support it.

Let's first test with a very simple reward function, reward = sin(item) + cos(context) + noise.

function reward = feedback_function(items, contexts)
    X = items;
    Y = contexts;
    reward = sin(X) + cos(Y);
    noise = randn(size(reward))*0.01;
    reward = (reward + noise)';
end

Determine an Item List and Context for each Round

You must also create a function that can tell which items are available in any given round t, if all items are always available you can simply return the whole list of items at every round. For this test, we choose our item domain from -3 to 3 with 0.05 intervals, and at every round t only a randomly selected 1/4 of all items will be available for recommendation.

function item_list = available_item_list_creator(t)
    items = -3:0.05:3;
    item_list = datasample(items, int8(length(items)/4), 'Replace', false);
end

Similarly a random context will be selected at each round.

function context = context_creator(t)
    contexts = -3:0.05:3;
    context = datasample(contexts, 1);
end

Create Item & Context Kernels, Combine them and Set their Hyperparameters

Define a kernel using GPML Kernels, there is probably a kernel for your needs. If not, you can always create a custom one using covDiscrete, it is entirely possible to create diffusion kernels from graphs as used in the CGP-RANK paper by using GPML kernel functions, [check the documentation here!] (http://www.gaussianprocess.org/gpml/code/matlab/doc/manual.pdf)

For this simple reward function, we will use isotropic squared exponential kernels without scaling.

item_kernel = {@covSEisoU}; item_hyp = 0; 
mask = [1,0]; % binary mask so that this kernel applies only to items
item_kernel_masked = {@covMask,{mask,item_kernel}};

context_kernel = {@covSEisoU}; context_hyp = 0; 
mask = [0,1]; % binary mask so that this kernel applies only to contexts
context_kernel_masked = {@covMask,{mask,context_kernel}}; 

% product of two seperate kernels to create one composite kernel
kernel ={@covProd,{item_kernel_masked, context_kernel_masked}};
GP_hyperparameters.cov = [item_hyp, context_hyp];

Initialize and Run

% Total number of rounds
T = 40;
% number of items to be recommended in each round
batch_size = 5;
% f handles
context_receiver_handle = @context_creator;
available_item_list_receiver_handle = @available_item_list_creator;
feedback_handle = @feedback_function;
% calculate the optimal recommendation list for regret plots, this is an optional argument
calculate_optimal_rewards = true;

% Run!
[item_choices, observations, oracle] = CGPRank(kernel, GP_hyperparameters,...
    batch_size, T, context_receiver_handle, available_item_list_receiver_handle, ...
    feedback_handle, calculate_optimal_rewards);

Some Tests and Plots

Reward Function, Predicted Reward Function and Regret Plots

The sublinear regret shows that the CGP-RANK converges to the optimal recommendation list across context space.

Choices over Time and Predictive Mean of Reward Space by GP Inference

Note that as the algorithm aims to maximize rewards over time, it only needs to accurately map the highly rewarding parts of the environment, it does this by a balanced trade-off between exploration and exploitation using an upper-confidence style scheme, namely choosing item-context pairs that maximizes its UCB selection rule (predictive mean + (a time-varying factor)* predictive variance).

Speeding it UP 🐆

TicToc for the same test for T = 200

Delay Feedback

Continuously performing updates on GP Inference can be prohibitively expensive for only very little immediate return, especially if there is already a good number of feedback acquired. CGPRANK can be accelerated by reusing the same recommendation multiple times, accumulating feedback and performing delayed updates.

Two optional arguments, "t_feedback_delay" and "feedback_delay_growth_ratio" control the feedback delaying. After t > t_feedback_delay, feedback delaying becomes active and the updates are performed everytime the dataset expands by an amount of feedback_delay_growth_ratio since the last update, i.e. for t_feedback_delay = 100 and feedback_delay_growth_ratio = 0.1, updates after t > 100 happen at 110, 121, 133, 146, 160 ...

With the above values of feedback delay, elapsed time is:

Scaling GP Inference

There are a couple ways to do this that GPML Matlab supports, however only a starter code is added for Sparse Covariance Approximations that uses inducing inputs to reduce the GP Inference's complexity to O(m^2*n) where m is the number of inducing inputs, instead of the regular O(n^3) time complexity. The last argument of CGPRANK is the optional argument t_approximation, if this is set, after t > t_approximation, GPSparseCovRegression is used instead of GPInference.

To test speed up, inducing inputs are chosen randomly from previously chosen points and their number is increased very slowly over time after t_approximation. This incurred extra regret but accelerated the algorithm as seen below.

A Harder Test 🌋

We can now test on a harder function, namely on Schaffer Function N. 2. Matern Kernels are used for this test.

Reward Function, Predicted Reward Function and Regret Plots

Reward Function Predictive Mean of Reward Function at T = 200 Regret Plot

Choices over Time and Predictive Mean of Reward Space by GP Inference

Acknowledgments

Reference

[1,2] Vanchinathan, H. P., Nikolic, I., Bona, F. D., & Krause, A. (2014). Explore-exploit in top-N recommender systems via Gaussian processes. Proceedings of the 8th ACM Conference on Recommender Systems - RecSys 14. doi: 10.1145/2645710.2645733