TalkingData Kaggle competition (link)

This repository includes data preprocessing, feature engineering and machine learning techniques to produce the top 13% results on Kaggle Private leaderboard

Preprocess

Feature Engineering : calculate time to next click/previous click, features' cummulative counts, lag features ...
Mean Encoding Features + Generate train/validation/test set features

Modeling

Random Forest Hyperparameter tuning, evaluate Random Forest's feature importance and visualize redundant features using dendogram
XGBoost Hyperparameter tuning (without tree depth tuning), visualize feature importance. This notebook produces final submission file.
Deep Neural Network with categorical embeddings Built with pytorch and fast.ai library wrapper. Use cyclical learning rate to speed up training process.
Blending Simple average blending and short tutorial of using numpy memory map to save memory while process data.

anhquan0412/talkingdata_clickfraud

TalkingData Kaggle competition (link)

Preprocess

Modeling