Educational framework for Probabilistic Matrix Factorization with JAGS in R. The original idea is taken from: Mnih, Andriy, and Ruslan Salakhutdinov. "Probabilistic matrix factorization." In Advances in neural information processing systems, pp. 1257-1264. 2007.
. The current version contains several variations of the original models, several encapsulated wrappers for JAGS and few evaluation methods.
models
- various BUGS/JAGS modelswrappers
- JAGS wrappers for PMF in Rparams
- scripts for parameters preparationevaluation
- evaluation methodsrun.R
- the main script that loads data, prepares parameters, run samplers and evaluates resultsdata
- sample data (including a part of ml-100k movielens subset)data_generator.R
- a script that generates sample input data
Data contains N users, M items and dataSize ratings from range 1-5. Input (train & test) files should follow the format of movielens ml-100k.
However the input data is sparse, internally (in models) two representations can be used. Different models can work with one or the another representation:
- dense train and test matrices with NAs for missing elements
- sparse matrix ixs (with users in the first row and items in the second) along with two vectors userIxs = ixs[,1] and itemIxs = ixs[,2]. Number of rows is passed as a dataSize parameter.
Users and items are represented by D latent factors (per each user and per each item) and stored in matrices u (N x D) and v (M x D). Ratings and latent representations are believed to be sampled with distributions parametrized with deviations s, su, sv. However models can differ in manifold ways.
Few models work with the dense representation where missing ratings are represented with non-observed nodes (=NAs) and some with the sparse representation where missing ratings are not represented as nodes at all. Although it is not obvious both representations seems to be equivalent.
Some models apply squashing function g(x)=1/(1+exp(-x)) whereas others not. If g is applied a noise (measured with the standard deviation of ratings s) can be added either before or after squashing.
Furthermore, deviations s (ratings), su (users), sv (items) can be sampled from various distributions or passed as fixed parameters.
Different variations of the initial model perform differently:
pmf_fixed.bug
- dense representations, no squashing, deviations s, su, sv as fixed parameterspmf_hypnorm3.bug
- dense representations, squashing, noise added after squashing, deviations s, su, sv sampled from *~ uniform(0,100)*pmf_hypnorm2.bug
- dense representations, no squashing, deviations s, su, sv sampled from *~ uniform(0,100)*pmf_hypnorm.bug
- dense representations, squashing, noise added before squashing, deviations s, su, sv sampled from *~ uniform(0,100)*pmf_hypnorm_sparse2.bug
- sparse representations, squashing, noise added before squashing, deviations s, su, sv sampled from *~ uniform(0,100)*pmf_hypnorm_sparse.bug
- sparse representations, squashing, noise added after squashing, deviations s, su, sv sampled from *~ uniform(0,100)*
Different JAGS wrappers can be found on-line (it is not always clear what were the motivations of their authors). Output formats and parametrizations are different. To make use of some of them in the unified way, the following scripts (wrappers
directory) are provided:
R2jags.R
rjags.R
runjags.R
coda.R
Each of the scripts makes use of similar set of parameters (see below) and provides standarized output (that is stored to the R namespace):
- retrieved.u - N x D-dimensional matrix of users latent factors
- retrieved.v - M x D-dimensional matrix of items latent factors
- iterations - a number of generated samples
- user_sample - a function (i, k) that returns D-dimensional, k-th sample of i-th user
- item_sample - a function (j, k) that returns D-dimensional, k-th sample of j-th item
Additionally, R2jags.R
supports parallel computing. If n.cluster>1 parallel version is used. However the condition n.chains>n.cluster must hold.
Scripts from params
directory prepares objects jags.data, jags.inits, jags.params that are required by samplers and used in models.
Currently, the following scenarios are modelled:
fixed.R
- dense representation, fixed deviations s, su, svhypnorm.R
- dense representation, deviations s, su, sv sampledsparse.R
- sparse representation, deviations s, su, sv sampled
The objects are prepared according to the parameters (typically only these parameters are modified):
- model.file - model path (can be overwritten with a parameter of the main script)
- n.chains - number of chains
- n.iter - number of iterations per chain, excluding n.burnin (after adaptation phase), before thinning
- n.burnin - adaptation phase length (thinning does not apply here)
- n.thin - thinning
- D - latent factors dimensionality
- lu - a proportion between su and s (see the paper)
- lv - a proportion between sv and s (see the paper)
- n.cluster - number of processes (for parallel computing)
Evaluation is done on the output of the wrappers (see above) with the help of supplied test data. Currently the following strategies are implemented:
baselines.R
- pmf-blind baselines: random prediction and mean from the trainingmle_g.R
- predictions based on expected values of latent matrices retrieved.u and retrieved.v (with squashing g)mle.R
- predictions based on expected values of latent matrices retrieved.u and retrieved.v (without squashing g)bayesian_g.R
- fully bayesian predictions based on summing over samples with the help of user_sample and item_sample functions (with squashing g)bayesian.R
- fully bayesian predictions based on summing over samples with the help of user_sample and item_sample functions (without squashing g)
The main script run.R
is responsible for selecting the input data, parameters, wrappers and evaluation methods according to parameters:
Usage: Rscript run.R [options]
Options:
-f CHARACTER, --train=CHARACTER
train data file [default = data/artificial/u1.base]
-t CHARACTER, --test=CHARACTER
test data file [default = data/artificial/u1.test]
-p CHARACTER, --params=CHARACTER
R script (from params// directory) that prepares model & learning parameters [default = params/hypnorm.R]
-m CHARACTER, --model=CHARACTER
selects model file (overwrites the one selected in params file)
-w CHARACTER, --wrapper=CHARACTER
selects JAGS wrappers (from wrappers// directory) [default = wrappers/R2jags.R]
-e CHARACTER, --eval=CHARACTER
a list of comma-separated evaluation methods (from evaluation// directory) [default = evaluation/baselines.R,evaluation/mle.R,evaluation/mle_g.R]
-s, --save
saves the workspace when quitting (default off)
-h, --help
Show help message and exit
Sample run:
Rscript run.R -f data/artificial/u1.base -t data/artificial/u1.test -p params/hypnorm.R -m models/pmf_hypnorm.bug -w wrappers/runjags.R -e evaluation/baselines.R,evaluation/mle_g.R,evaluation/bayesian_g.R
Some of the above models were experimentally compared on movielens data. Results can be found below.
- data: movielens 100k (
u1.base
+u1.test
from thedata//ml-100k//
) R2jags.R
wrapper- 1 chain, 5k burin iterations, 5k samples, thin=1.
- train data mean baseline: rmse = 1.154
- random baseline: rmse = 1.713
- start.s = mean over training sample = 0.28
- start.su = sqrt(start.s^2/0.05) = 1.25
- start.sv = sqrt(start.s^2/0.05) = 1.25
- u ~ N(0, start.su)
- v ~ N(0, start.sv)
model | RMSE | RMSEb | E(s) | E(su) | E(sv) |
---|---|---|---|---|---|
pmf_hypnorm3.bug |
0.947 | 0.943 | 0.187 | 0.482 | 0.789 |
pmf_hypnorm_sparse.bug |
0.951 | 0.941 | 0.187 | 0.597 | 0.634 |
pmf_hypnorm2.bug |
1.072 | 1.055 | 0.192 | 0.325 | 0.387 |
pmf_hypnorm_sparse2.bug |
1.006 | 1.038 | 2.515 | 0.971 | 0.941 |
pmf_hypnorm.bug |
1.022 | ? | 2.515 | 0.815 | 1.124 |
pmf_hypnorm.bug x1 |
1.011 | ? | 2.514 | 0.381 | 2.418 |
pmf_fixed.bug x2 |
1.062 | 1.077 | ? | ? | ? |
- RMSEb - bayesian
- x1 (
rjags.R
instead ofR2jags.R
) - x2 (s=0.28, su=1.25, sv=1.25)
- new model: Wishart priors for s, su, sv
- coda wrapper