Social Poisson Factorization (SPF)

This software is distributed under the MIT license. See LICENSE.txt for details.

Repository Contents

conf contains a base configure file for running LibRec to do model comparisons
scripts bash and python scripts for data processing and running experiments
src C++ source code
dat example data
README.md this file

Data

The input format for data is tab-separted files with integer values:

user id    item id    rating

The ratings should be separated into training, testing, and validation data; scripts/process_data.py helps divide data into these different sets. This script also culls the user network such that only connections that have at least one item in common are included.

python process_data.py [ratings-file] [network-file] [output-dir]

Alternatively, data with time information (like shown below) can be processed with process_time_data.py which takes the same arguments as process_data.py. This will split the data according to time; ratings are implicit and therefore binary.

user id    item id    unix time

Running SPF

Clone the repo: git clone https://github.com/ajbc/spf.git
Navigate to the spf/src directory
Compile with make
Run the executable, e.g.: ./spf --data ~/my-data/ --out my-fit

SPF Options

Option	Arguments	Help	Default
help		print help information
verbose		print extra information while running	off
out	dir	save directory, required
data	dir	data directory, required
svi		use stochastic VI (instead of batch VI)	off for < 10M ratings in training
batch		use batch VI (instead of SVI)	on for < 10M ratings in training
a_theta	a	shape hyperparamter to theta (user preferences)	0.3
b_theta	b	rate hyperparamter to theta (user preferences)	0.3
a_beta	a	shape hyperparamter to beta (item attributes)	0.3
b_beta	b	rate hyperparamter to beta (item attributes)	0.3
a_tau	a	shape hyperparamter to tau (user influence)	2
b_tau	b	rate hyperparamter to tau (user influence)	5
a_delta	a	shape hyperparamter to delta (item bias)	0.3
b_delta	b	rate hyperparamter to delta (item bias)	0.3
social-only		only consider social aspect of factorization (SF)	include factors
factor-only		only consider general factors (no social; PF)	include social
bias		include a bias term for each item	no bias
binary		assume ratings are binary	integer
directed		assume network is directed	undirected
seed	seed	the random seed	time
save_freq	f	the saving frequency. Negative value means no savings for intermediate results.	20
eval_freq	f	the intermediate evaluating frequency. Negative means no evaluation for intermediate results.	-1
conv_freq	f	the convergence check frequency	10
max_iter	max	the max number of iterations	300
min_iter	min	the min number of iterations	30
converge	c	the change in rating log likelihood required for convergence	1e-6
final_pass		do a final pass on all users and items	no final pass
sample	sample_size	the stochastic sample size	1000
svi_delay	tau	SVI delay >= 0 to down-weight early samples	1024
svi_forget	kappa	SVI forgetting rate (0.5,1]	default 0.75
K	K	the number of general factors	100

Running an Experiment

Download and compile code for comparison models: cd scripts/; ./setup.sh; cd ..
Kick off fits for multiple models with the script (from scripts directory):