kaggle_hm: A Python repository from SDriven

Archive contents

kaggle_model_5th.zip  : original kaggle model upload - contains original code, source code, solution etc
H&M_5th_solution.pdf  : team resumes and solution summary
models                : model binaries used in generating solution
src                   : code to rebuild models and generate the submission from scratch

Hardware: (The following specs were used to create the original solution)

Ubuntu 16.04 LTS (2 TB boot disk)
56 vCPUs, 377GiB memory
1 x NVIDIA Tesla P100 16GB

Software (python packages are detailed separately in `requirements.txt`):

Python 3.7.7
CUDA 11.4
cudnn 8005
nvidia drivers v.470.82.01

Data setup (assumes the Kaggle API is installed)

Below are the shell commands used in each step, as run from the top level directory

mkdir data
cd data
kaggle competitions download -c h-and-m-personalized-fashion-recommendations
unzip h-and-m-personalized-fashion-recommendations.zip

Procedure to reproduce the result(Overall it takes about 30 hours to run through the whole pipeline.)

By the way, the most time-consuming part is generating candidates and computing features for the candidates customer-article pairs. The candidates and feature file for prediction is quite large(in total more than 400 GiB), so it's better to run through the pipeline step-by-step rather than I upload the huge feature file, models files and then they're downloaded and loaded to make the prediction (I only upload the models).

cd ./code
sh run.sh

Finally there will be a submission file named "submission.csv" under the folder "./subs".

SDriven/kaggle_hm

Archive contents

Hardware: (The following specs were used to create the original solution)

Software (python packages are detailed separately in requirements.txt):

Data setup (assumes the Kaggle API is installed)

Procedure to reproduce the result(Overall it takes about 30 hours to run through the whole pipeline.)

Software (python packages are detailed separately in `requirements.txt`):