/benchmark-xgboost-java

Benchmark existing XGBoost Java libraries

Primary LanguageJava

Benchmark XGBoost libraries for Java

Introduction

This repository contains the code that served as base for results I have discussed at Fast Machine Learning Predictions article. The goal is to share the code, keep adding other libraries and improving the benchmark to be as most accurate as possible.

As XGBoost is a mainstream machine learning method and many applications are running Java, there are many implementations of XGBoost for Java and libraries that are not specific for Java such as rJava with XGBoost running inside R. The ones compared at the moment are:

Benchmark

The prediction latency is the metric evaluated. For each of the libraries above we benchmark 9 use cases (UC):

Use cases

The benchmark uses JMH and the results are in the charts folder, they are separated by either linear or tree booster type (tree and linear) and percentile (.50, .90, .95, .99, .999, .9999). The benchmark has been executed with 2 warm-up iterations and 5 measurement iterations.

The charts.R file is the R script used to generate the PDF charts. The benchmark results were created using my personal computer running a Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz.

The models used for predictions are not in this repository as their files are too large to be checked in into Github, which limits to 100MB per file. You can download the compressed models here and extract them in the resources folder.

Running the benchmark

You can either just ./gradlew jmh or run it inside your IDE. Given that the project relies on RJava for one the benchmarks, you will have to setup environment variables pointing to your JRI and R installation. I suggest cloning and opening the project on your favorite IDE and setting up the following environment and JMV variables

  • '-Djava.library.path=' + '.:/usr/lib/R/site-library/rJava/jri/'
  • environment variable R_HOME, '/usr/lib/R'
  • environment variable 'CLASSPATH', '.:/usr/lib/R/site-library/rJava/jri/'
  • environment variable 'LD_LIBRARY_PATH', '/usr/lib/R/site-library/rJava/jri/'

On Intellij IDEA my runner configuration looks like:

Runner configuration

Some results

For all the results please check the charts folder. The results shown below are for both linear and tree booster types and depict the 99th percentile for the prediction latency in milliseconds.

Tree

Prediction latency faceted by predictor Prediction latency faceted by use case

Linear

Prediction latency faceted by predictor Prediction latency faceted by use case

Future work

  • Add Treelite predictor
  • Add JEP (or other Java to Python library) predictor
  • Add Rserve and Caret predictor
  • Add a ThreadPoolExecutor in order to parallelize the execution of predictors that do not allow parallel evaluation such as JPMML