/talkingdata_clickfraud

Kaggle competition

Primary LanguageJupyter Notebook

TalkingData Kaggle competition (link)

This repository includes data preprocessing, feature engineering and machine learning techniques to produce the top 13% results on Kaggle Private leaderboard

Preprocess

Modeling

  • Random Forest Hyperparameter tuning, evaluate Random Forest's feature importance and visualize redundant features using dendogram
  • XGBoost Hyperparameter tuning (without tree depth tuning), visualize feature importance. This notebook produces final submission file.
  • Deep Neural Network with categorical embeddings Built with pytorch and fast.ai library wrapper. Use cyclical learning rate to speed up training process.
  • Blending Simple average blending and short tutorial of using numpy memory map to save memory while process data.