Anil's Solution for OTTO – Multi-Objective Recommender System

The solution pipeline consists of:

Generating Covisitation Matrices

Generated Top-100 AIDs for three well-known covisitation schemes given in: link

Splitting the Data as Train/Val

Used the local validation scheme given by Radek. The local train data was also splitted into two by sessions in order to avoid possible leakage during model training and score calculation. The implementation can be seen in the corresponding notebook.

Generating Co-Occurrence Matrices

Generated all pair occurrences for all AIDs among all sessions for all action pairs (click-cart, cart-order, etc.). This is used for feature extraction later.

Candidate Generation

Used the public candidate generation script and generated 100 candidates for all action types.

Feature Extraction

Generated features for following data subsets:

Items
Sessions
Item-Session Combinations
Covisitation and Co-Occurrence Statistics

Item Features

Statistics generated from hour, weekday and weekend status
Count features (bool for >0 and >1, rank among all)
Unique count features (unique count and rank among all)
Distribution of action types in percentiles
Inclusion rate by all sessions
Occurrence rate in the last week of data
Average number of times seen in the same sessions at different times
All of the above with filtered separately for all action types

Session Features

Statistics generated from hour, weekday and weekend status
Count features (bool for >0 and >1, rank among all)
Unique count features (unique count and rank among all)
Distribution of action types in percentiles
Length of the session
Features generated by extracting mini-sessions according to the time differences between actions
Statistics generated from multiple purchases made in a single basket
Rates of taking products to the next action within the same session (click->cart, cart->order)
All of the above with filtered separately for all action types

Item-Session Combination Features

Statistics generated from hour, weekday and weekend status
Count features (bool for >0 and >1, rank among all)
Unique count features (unique count and rank among all)
Distribution of action types in percentiles
Reversed order of the item in the session
Time difference between the latest occurrence of the item and the start - end of the session

Covisitation and Co-Occurrence Statistics

Statistics generated from covisitation and co-occurrence scores between candidate items and items in the session's history

Training

Used the following config:

Model: XGBoost
Fold Scheme: 5-Fold (Grouped by "session")
Negative Sampling Fraction: 15%
Dropped sessions with no positive labels
Used the first half of splitted local training set

Inference

Used mean blending
Executed on the second half of splitted local training set when running local validation

Didn't Work & Improve

Weekday-Specific aggregations
Word2Vec features
Different models (CatBoost, LGBM)
Comprehensive pair scores (because of OOM errors)
Max-median blending
Early-stopping
Higher negative fractions
Different objective metrics
Different fold counts

nlztrk/OTTO-Multi-Objective-Recommender-System