cu2rec: CUDA Meets Recommender Systems

cu2rec is a Matrix Factorization library designed to accelerate training Recommender Systems models using GPUs in CUDA. It implements Parallel Stochastic Gradient Descent for training the matrix factorization model.

Data

The input data should be a CSV file in the form of userId,itemId,rating and should have an header. If the user ids and the item ids are not sequential, run python preprocessing/map_items.py <ratings_file> to convert the user ids and item ids into sequential integers, starting with 1.

Once you have a mapped CSV, you can use python preprocessing/split_to_test_train.py <mapped_file> <test_ratio> to split the data into training and tests sets to use with mf.cu.

Alternatively, you can also use the datasets below:

Movielens

Download movielens data here and save in data folder.
Run python preprocessing/map_items.py <ratings_file> to create a user-item mapped ratings file.
Run python preprocessing/split_to_test_train.py <mapped_file> <test_ratio> to split it into training and test files.

Netflix

Download the Netflix dataset here and place in under data/datasets/netflix.
Run python preprocessing/map_netflix.py to create the mapped training and test files.

Compiling Code

SSH into Prince or cuda2 using NYU credentials
srun -t5:00:00 --mem=30000 --gres=gpu:1 --pty /bin/bash
module load cuda/9.2.88
cd matrix_factorization && make

The makefile compiles for compute capability 5.2. If you have a GPU that does not support that, please change it to compile for your device's compute capability. The code has been tested for compute capability down to 3.5.

Training

make mf
bin/mf -c <config_file> <ratings_file_train> <ratings_file_test>

Running all possible configurations

In order to run all of the experiments mentioned in the report, you can cd experiments and run the included bash scripts. cu2rec.sh will give you the total runtimes and error metrics for all configurations, while cu2rec_prof.sh will give you all the nvprof results. Make sure you have all the data as described in the data section.

Getting recommendations for a user

Make sure you get the user data into the same ratings format as MovieLens.
make predict
bin/predict -c <config_file> -i <trained_item_bias_file> -g <trained_global_bias_file> -q <trained_Q_file> <ratings_file>

Running Tests

cd tests
make
If you want to run all tests, make run_all
Otherwise, bin/test_{}

Authors

Nick Greenquist
Doruk Kilitcioglu

nickgreenquist/cu2rec