Support lightgbm (boosting_type="rf") ?
OnlyFor opened this issue · 5 comments
Here is my code,
import numpy as np
import pandas as pd
import lightgbm as lgb # version 2.3.1
from sklearn2pmml import sklearn2pmml, make_pmml_pipeline # 0.52.0
df_ = pd.DataFrame({"aaaaaaaaaaaaaaaaaa": np.random.rand(10000)})
for i in range(20):
df_["var_" + str(i)] = np.random.rand(10000)
for i in range(30, 100):
df_["var_" + str(i)] = np.random.randint(0, 20, 10000)
df_.iloc[-2000:] = np.NaN
df_["target"] = np.random.randint(0, 2, 10000)
y = df_["target"]
X = df_.drop("target", axis=1)
model1 = lgb.sklearn.LGBMClassifier(
**{
"boosting_type": "gbdt",
"max_depth": 3,
"learning_rate": 0.05,
"n_estimators": 10,
# "bagging_fraction": 0.8,
# "bagging_freq": 1,
# "subsample": 0.8,
# "subsample_freq": 1,
}
)
model2 = lgb.sklearn.LGBMClassifier(
**{
"boosting_type": "rf",
"max_depth": 3,
"learning_rate": 0.05,
"n_estimators": 10,
"bagging_fraction": 0.8,
"bagging_freq": 1,
"subsample": 0.8,
"subsample_freq": 1,
}
)
model1.fit(X, y)
model2.fit(X, y)
df_["model1_p1"] = model1.predict_proba(X)[:, 1]
df_["model2_p1"] = model2.predict_proba(X)[:, 1]
df_.to_csv("input.csv", index=False, encoding="utf-8")
sklearn2pmml(make_pmml_pipeline(
model1, active_fields=X.columns.tolist(), target_fields="target"), "model1.pmml")
sklearn2pmml(make_pmml_pipeline(
model2, active_fields=X.columns.tolist(), target_fields="target"), "model2.pmml")
java -cp pmml-evaluator-example-executable-1.4.12.jar org.jpmml.evaluator.EvaluationExample --model model1.pmml --input input.csv --output output1.csv --missing-values "" --separator ","
probability(1) == model1_p1
java -cp pmml-evaluator-example-executable-1.4.12.jar org.jpmml.evaluator.EvaluationExample --model model2.pmml --input input.csv --output output2.csv --missing-values "" --separator ","
probability(1) != model2_p1 :( ???
IIRC, the JPMML-LightGBM library does not check the value of the boosting_type
attribute.
Therefore, it encodes "gbdt" and "rf" boosting types identically, following the "gbdt" procedure. Based on the above evidence, there is a need to detect "rf" boosting type, and do something differently.
thx,
different boosting_types can be found in
https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html
boosting_type (string, optional (default='gbdt')) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.
@OnlyFor Open model2.pmml
in text editor, and on line 143 change the value of Segmentation@multipleModelMethod
attribute from sum
(gbdt) to average
(rf).
Then you have correct RF predictions.
@vruusmann it works ! thx !
it works!
Just made this comment to show that the fix for "rf" booster type is really simple. Will probably implement it in code later this week.