This repository contains a Jupyter notebook that outlines my approach for the Kaggle Facebook Recruiting IV contest.
The .csv-files of predictions can be generated as follows:
-
Download and extract the data from https://www.kaggle.com/c/facebook-recruiting-iv-human-or-bot/data. Place the files
train.csv
,test.csv
andbids.csv
into thedata
-directory of this repository. Place the filesampleSubmission.csv
into thesubmissions
-directory of this repository. -
Run the
facebook_notebook.ipynb
-notebook. This should generate the submission filefacebook_submission.csv
into thesubmissions
-directory.
Libraries:
The basic scientific Python libraries + XGBoost
Running time/Hardware:
Runs in about 15 minutes on a fairly high-powered desktop (i7-4790) with 16 gb of RAM. Can clog up the ram on smaller machines.
Update Feb 21st 2016
- Added a
Model interpretation
-section to the notebook - Added a
hyperopt_xgb.py
-script that shows how hyperparameters can be optimized using a grid search.- The script generates a file
hyperopt_xgb.csv
in the root of the repository which displays a selection of hyperparameters and the corresponding cross-validatedAUC
-score. - Running the script requires two additional dependencies: the
hyperopt
- andpymongo
-libraries.
- The script generates a file