KDD-Cup-2019

This repository is my solution of KDD Cup 2019 Regular ML track (Context-Aware Multi-Modal Transportation Recommendation). See Competition Website for the details. In this competiton, I got the 57th place at phase1 and 52nd at phase2 (could not enter to phase3).

Phase1

Result

57th place of 1702 teams.

LB score: 0.69917984
Local cv score: 0.678330

Model Pipeline

See phase1 final version for the details.

Key Findings

Features
See features I used. The most important feature was plan_0_transport_mode. In phase1, people click plan_0_transport_mode in about 60% of sessions (it means people likely to click a plan displayed on the top?). I also used count & target encoded features for these categorical variables. As a result, my best single model scored 0.6925 on LB.
Sub Models
I prepared two sub models, one trained by queries and the other by queries & profiles. By adding their outputs to the main model's features, LB score improved from 0.6925 to 0.6945.
Post Processing
Post processing improved LB score from 0.6945 to 0.6991. Some classes (0, 3, 4, 6) in out of fold predictions accounted for smaller percentage compared to that of train data. So I adjusted predictions for these classes by constant multiples. The multiples were dicided by maxmizing out of fold f1 score (see blending).

Phase2

Result

52nd place of 100teams.

LB score: 0.69362814
Local cv score: 0.657519

Model Pipeline

See phase2 final version for the details.

Key Findings

Splitting Model
In phase2, there were 3 cities in dataset. I splitted main model by cities since the distribution of transport mode were quite diffirent. After splitting model, LB score reached to 0.6900.
Features
Features were almost the same as that of phase1 but I did target encoding by every 3 cities.
Post Processing
The same post processing as phase1 applyed for class 0, 3, 4. Finally the best LB score was 0.6936.

MitsuruFujiwara/KDD-Cup-2019

KDD-Cup-2019

Phase1

Result

Model Pipeline

Key Findings

Phase2

Result

Model Pipeline

Key Findings