This repo aims to compare the scoring speed of several open source machine learning libraries. The focus will be on scoring provided via a REST API (via web requests).
Trained LR, RF (100 trees, depth 10), GBM (100 trees, depth 10) and NN (2 hidden layers, 200 neurons each) using h2o and exported Java scoring code. Built a prediction service using Steam.
Scoring sequentially using Python via REST API. TODO: Parallelize the client (server is multithreaded AFAIK).
Round-trip time is about 2ms for all algos. This includes client request, network trip, server prep and scoring itself. TODO: Get the breakdown.
Detailed code and results here.
It seems all algos are very fast, maybe scoring itself is <1ms for each of them (LR should be orders of magnitude faster than RF/GBM). TODO: Measure scoring time from Java.
TODO: Concurrency/throughput, some attempts here.
The plumber
R package provides a REST API for an R function. There is a ~5ms overhead for using the framework
(5.8ms measured via the Python client).
Trained LR on same data with glmnet
(actually using glmnetUtils
that deals with categorical variables doing
one-hot encoding under the hood and calling glmnet
).
Code here.
Latency is ~25ms of which ~5ms should be plumber (see above) and 20ms should be scoring with glmnetUtils
(but see below).
Further inspection
shows about 14ms doing the one-hot encoding in glmnetUtils
and 1ms scoring with glmnet
,
a total of 15ms. TODO: What's the remaining 5ms?
Trained gbm
100 trees, depth 10. gbm
deals directly with categorical values (no need for one-hot encoding).
Code here.
Total round-trip only 8ms (6ms for plumber and I measured 2ms for gbm)
TODO: 1-hot encoding at scoring.
It seems scoring itself is ~1ms. One-hot encoding will probably take a lot more.
TODO: Integer encoding (current code crashes with segfault, dunno why).
Scoring itself is 1ms.
Unless the tool is dealing with categorical variables internally (h2o, the gbm
R package), one-hot encoding
might take 90% of the scoring time and also 90% of the implementation work. In Python this might be even worse.