TalkingData Kaggle competition (link)
This repository includes data preprocessing, feature engineering and machine learning techniques to produce the top 13% results on Kaggle Private leaderboard
- Feature Engineering : calculate time to next click/previous click, features' cummulative counts, lag features ...
- Mean Encoding Features + Generate train/validation/test set features
- Random Forest Hyperparameter tuning, evaluate Random Forest's feature importance and visualize redundant features using dendogram
- XGBoost Hyperparameter tuning (without tree depth tuning), visualize feature importance. This notebook produces final submission file.
- Deep Neural Network with categorical embeddings Built with pytorch and fast.ai library wrapper. Use cyclical learning rate to speed up training process.
- Blending Simple average blending and short tutorial of using numpy memory map to save memory while process data.