RecSys2018: A Java repository from kevin622

2018 ACM RecSys Challenge 1'st Place Solution For Team vl6 [paper]

Team members: Maksims Volkovs, Himanshu Rai, Zhaoyue Cheng, Yichao Lu (University of Toronto), Ga Wu (University of Toronto, Vector Institute), Scott Sanner (University of Toronto, Vector Institute)

Contact: maks@layer6.ai

Introduction

This repository contains the Java implementation of our entries for both main and creative tracks. Our approach consists of a two-stage model where in the first stage a blend of collaborative filtering methods is used to quickly retrieve a set of candidate songs for each playlist with high recall. Then in the second stage a pairwise playlist-song gradient boosting model is used to re-rank the retrieved candidates and maximize precision at the top of the recommended list.

Environment

The model is implemented in Java and tested on the following environment:

Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
256GB RAM
Nvidia Titan V
Java Oracle 1.8.0_171
Python, Numpy 1.14.3, Sklearn 0.19.1, Scipy 1.1.0
Apache Maven 3.3.9
CUDA 8.0 and CUDNN 8.0
Intel MKL 2018.1.038
XGBoost and XGBoost4j 0.7

Executing

All models are executed from src/main/java/main/Executor.java, the main function has examples on how to do main and creative track model training, evaluation and submission. To run the model:

Set all paths:

//OAuth token for spotify creative api, if doing creative track submission
String authToken = "";

// path to song audio feature file, if doing creative track submission
String creativeTrackFile = "/home/recsys2018/data/song_audio_features.txt";

// path to MPD directory with the JSON files
String trainPath = "/home/recsys2018/data/train/";

// path to challenge set JSON file
String testFile = "/home/recsys2018/data/test/challenge_set.json";

// path to python SVD script included in the repo, default location: script/svd_py.py
String pythonScriptPath = "/home/recsys2018/script/svd_py.py";

//path to cache folder for temp storage, at least 20GB should be available in this folder
String cachePath = "/home/recsys2018/cache/";

Compile and execute with maven:

export MAVEN_OPTS="-Xms150g -Xmx150g"
mvn clean compile
mvn exec:java -Dexec.mainClass="main.Executor"

Note that by default the code is executing model for the main track, to run the creative track model set xgbParams.doCreative = true. For the creative track we extracted extra song features from the Spotify Audio API. We were able to match most songs from the challenge Million Playlist Dataset, and used the following fields for further feature extraction: [acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time_signature, valence]. In order to download the data for this track, you need to get the OAuth Token from Spotify API page and assign it to the authToken variable in the Executor.main function.

We prioritized speed over memory for this project so you'll need at least 100GB of RAM to run model training and inference. The full end-to-end runtime takes approximately 1.5 days.

kevin622/RecSys2018

2018 ACM RecSys Challenge 1'st Place Solution For Team vl6 [paper]

Introduction

Environment

Executing