In some cases it is required to make online predictions, particularly, with trained XGBoost model. This project is designed to use trained XGBoost model for online predictions for Cython arrays many times faster than with usual XGBoost Scikit-Learn API.
Files description:
c_xgb/c_xgb.cpp
- classes for predictions for XGBoost models written in C++c_xgb/c_xgb.h
- header forc_xgb.cpp
c_xgb/json.cpp
,c_xgb/json.hpp
- JSON parser taken from https://github.com/nbsdx/SimpleJSON/ and modifiedc_xgb_test.pyx
- Cython file with tests for CythonXGBsetup.py
- setup file for tests compilationc_xgb_test.py
- Python file which runs tests
- Python: 2.7.12
- Cython: 0.25.2
- Numpy: 1.13.0
- Scikit-Learn: 0.18.1
- XGBoost: up-to-day version built from source
- Note: current version of XGBoost still has problems with precision of floats in dumped models (dmlc/xgboost#1204). If your task is sensitive to high floating point precision in features, you can avoid differences in predictions by XGBoost and CythonXGB which may occur because of not enough precision in XGBoost dumps, please, change XGBoost source according to dmlc/xgboost#1204 (comment) and rebuild it.
mkdir trees
- Compile Cython module:
python setup.py build_ext --inplace
- Run tests:
python c_xgb_test.py
- Extern class from CPP file:
cdef extern from "c_xgb/c_xgb.cpp":
cdef cppclass CXgboost:
CXgboost()
CXgboost(int depth, int n_features, int n_trees_ , int objective_, double base_score_)
double predict(double *features)
- Train your XGBoost model and dump it:
tree_data = booster.get_dump(dump_format='json')
cdef i
for i in xrange(len(tree_data)):
f = open("trees/tree_%d.json" % i, 'w')
f.write(tree_data[i])
f.close()
- After training you can create an instance of CXgboost model:
cdef CXgboost model_c = CXgboost(depth, n_features, n_estimators, 0, base_score)
- To make predictions for one sample:
preds_c_xgb = model_c.predict(x_cy)
x_cy
should be a float cython array with current sample features
For tests were generated classification and regression datasets with Scikit-Library, they contain 10000 samples with 20 features. Predicitons were made for each sample 10 times
Mean_time_XGBoost is mean time for prediction for one sample by Scikit-Learn wrapper for XGBoost Mean_time_CythonXGB is mean time for prediction for one sample by CythonXGB
Acceleration is calculated as Mean_time_XGBoost / Mean_time_CythonXGB
Tests are performed on laptop with Intel® Core™ i5-4210U CPU, Ubuntu 16.04
n_estimators | max_depth | objective | acceleration, times |
---|---|---|---|
10 | 5 | "reg:linear" | 461 |
10 | 10 | "reg:linear" | 188 |
100 | 5 | "reg:linear" | 49 |
100 | 10 | "reg:linear" | 14 |
10 | 5 | "binary:logistic" | 730 |
10 | 10 | "binary:logistic" | 343 |
100 | 5 | "binary:logistic" | 84 |
100 | 10 | "binary:logistic" | 24 |