RecSys2018

MIPT_MSU team RecSys Challenge 2018 solution

Requirements

We used Python3.5

Install requirements from requirements.txt

You will also need Catboost, Starspace, Vowpal Wabbit and Python Transformer

Creating solution

All scripts are started from RecSys2018/recsys

For all scripts except recsys_script.sh you should activate virtualenv externaly in your bash session

In RecSys2018 folder create splitted_data folder and put million playlist dataset there (RecSys2018/splitted_data/raw)
Put challenge set into splitted_data folder (RecSys2018/splitted_data/challenge_set.json)
Encode million playlist dataset and challenge set
```
bash recsys_script.sh --encoding 
```
Train iALS and Starspace
```
bash recsys_script.sh --update_models 
```
Train name iALS
```
bash train_nals.sh 
```
Train SVD++
```
bash train_svd_pp.sh
```
Train Vowpal Wabbit
```
bash train_vw.sh
```
Create example files for Catboost
```
bash create_examples.sh
```

Create pools for Catboost from examples

bash test_to_vw.sh
bash vw_predict_train2.sh
bash add_vw_t2.sh
bash feature.sh

Train Catboost
```
bash cb.sh ~/catboost
```

Create candidates for challenge set

bash recsys_script.sh --update_candidates --train ../splitted_data/encoded_train.json --test ../splitted_data --test_dir

Predict with Vowpal Wabbit model

bash wv_on_unk_test.sh  ../splitted_data
bash vw_predict.sh  ../splitted_data
bash add_vw.sh ../splitted_data

Apply trained models

bash recsys_script.sh --apply --train ../splitted_data/encoded_train.json --test ../splitted_data --test_dir

Decode created solution

python utils/create_solution.py ../splitted_data/test_c_predictions \
                                ../splitted_data/tracks.json \
                                ../MIPT_MSU_solution.csv

Usage recommendations

With recsys_script.sh you may set path to Starspace binary with --starspace_path option. To your python virtualenv with --env option.
Path to catboost binary you may set as argument to cb.sh.
You will need about 100GB RAM
Most of our programs creates 32 threads
We recommend you to train Catboost on GPU, beause it takes several hours instead of days.

zakharovas/RecSys2018

RecSys2018

Requirements

Creating solution

Usage recommendations