MIPT_MSU team RecSys Challenge 2018 solution
We used Python3.5
Install requirements from requirements.txt
You will also need Catboost, Starspace, Vowpal Wabbit and Python Transformer
All scripts are started from RecSys2018/recsys
For all scripts except recsys_script.sh you should activate virtualenv externaly in your bash session
-
In RecSys2018 folder create splitted_data folder and put million playlist dataset there (RecSys2018/splitted_data/raw)
-
Put challenge set into splitted_data folder (RecSys2018/splitted_data/challenge_set.json)
-
Encode million playlist dataset and challenge set
bash recsys_script.sh --encoding
-
Train iALS and Starspace
bash recsys_script.sh --update_models
-
Train name iALS
bash train_nals.sh
-
Train SVD++
bash train_svd_pp.sh
-
Train Vowpal Wabbit
bash train_vw.sh
-
Create example files for Catboost
bash create_examples.sh
-
Create pools for Catboost from examples
bash test_to_vw.sh bash vw_predict_train2.sh bash add_vw_t2.sh bash feature.sh
-
Train Catboost
bash cb.sh ~/catboost
-
Create candidates for challenge set
bash recsys_script.sh --update_candidates --train ../splitted_data/encoded_train.json --test ../splitted_data --test_dir
-
Predict with Vowpal Wabbit model
bash wv_on_unk_test.sh ../splitted_data bash vw_predict.sh ../splitted_data bash add_vw.sh ../splitted_data
-
Apply trained models
bash recsys_script.sh --apply --train ../splitted_data/encoded_train.json --test ../splitted_data --test_dir
-
Decode created solution
python utils/create_solution.py ../splitted_data/test_c_predictions \ ../splitted_data/tracks.json \ ../MIPT_MSU_solution.csv
-
With recsys_script.sh you may set path to Starspace binary with --starspace_path option. To your python virtualenv with --env option.
-
Path to catboost binary you may set as argument to cb.sh.
-
You will need about 100GB RAM
-
Most of our programs creates 32 threads
-
We recommend you to train Catboost on GPU, beause it takes several hours instead of days.