Gradient Boosting in TensorFlow vs XGBoost

TensorFlow 1.4 includes a Gradient Boosting implementation, aptly named TensorFlow Boosted Trees (TFBT). This repo contains the benchmarking code that I used to compare it XGBoost.

For more background, have a look at the article.

Getting started

# Prepare the python environment
mkvirtualenv env
source env/bin/activate
pip install -r requirements.txt

# Download the dataset
wget http://stat-computing.org/dataexpo/2009/{2006,2007}.csv.bz2
bunzip2 {2006,2007}.csv.bz2

# Prepare the dataset
python preprocess_data.py

Running the experiments

Train and run xgboost:

python do_xgboost.py

Train and run TensorFlow:

python do_tensorflow.py

Draw nice plots:

python analyze_results.py

Timing results

./do_xgboost.py --num_trees=50  42.06s user 1.82s system 1727% cpu 2.540 total

./do_tensorflow.py --num_trees=50 --examples_per_layer=1000  124.12s user 27.50s system 374% cpu 40.456 total
./do_tensorflow.py --num_trees=50 --examples_per_layer=5000  659.74s user 188.80s system 356% cpu 3:58.30 total