Support for `cross_entropy` objective function in regression context?
phaidara opened this issue · 1 comments
Hello,
I am currently working on a project where I want to fit a model on probabilities and save it to PMML for later use in Java program.
I am training a LightGBMRegressor with the cross_entropy
objective function.
The training part is working well. I am able to fit a PMMLPipeline on my data and use it to predict probabilities as expected.
But the saving to PMML part is failing with the following exception:
SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException: Expected a regression-type objective function, got 'cross_entropy'
at lightgbm.sklearn.LGBMRegressor.checkLabel(LGBMRegressor.java:47)
at sklearn.Estimator.encode(Estimator.java:100)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233)
at org.jpmml.sklearn.Main.run(Main.java:217)
at org.jpmml.sklearn.Main.main(Main.java:143)Exception in thread "main" java.lang.IllegalArgumentException: Expected a regression-type objective function, got 'cross_entropy'
at lightgbm.sklearn.LGBMRegressor.checkLabel(LGBMRegressor.java:47)
at sklearn.Estimator.encode(Estimator.java:100)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233)
at org.jpmml.sklearn.Main.run(Main.java:217)
at org.jpmml.sklearn.Main.main(Main.java:143)
It seems that the cross_entropy objective function is not compatible with the LGBMRegressor in the jpmml LightGBM Java API. I tested the cross_entropy
with an LGBMClassifier and binary targets (0, 1) instead of probabilities, and this is working fine.
Would it be possible to fix this behavior? Thanks!
Reproducible example:
## Library import
import pandas as pd
import lightgbm as lgb
import sklearn2pmml
from sklearn.datasets import make_classification
from numpy.random import default_rng
# Random classification data
seed = 1234
x, y_cls = make_classification(random_state=seed)
# Fitting classifier on binary target
classifier = lgb.LGBMClassifier(objective = "cross_entropy")
clf_pipeline = sklearn2pmml.PMMLPipeline([("classifier", classifier)])
clf_pipeline.fit(x, y_cls)
# Saving classifier is working fine
sklearn2pmml.sklearn2pmml(clf_pipeline, "working_cross_entropy_classifier.pmml")
# Generating random probability target.
rng = default_rng(seed)
y_reg = rng.uniform(low=0, high=1, size=y_cls.shape)
# Fitting regressor on probability target
regressor = lgb.LGBMRegressor(objective = "cross_entropy")
reg_pipeline = sklearn2pmml.PMMLPipeline([("regressor", regressor)])
reg_pipeline.fit(x, y_reg)
# Prediction output probability scores
reg_pipeline.predict(x)
# But saving pipeline fails with above exception:
sklearn2pmml.sklearn2pmml(reg_pipeline, "non_working_cross_entropy_regressor.pmml")
The JPMML-LightGBM library treats cross_entropy
as a classification-type objective function:
https://github.com/jpmml/jpmml-lightgbm/blob/1.4.2/pmml-lightgbm/src/main/java/org/jpmml/lightgbm/GBDT.java#L549-L553
It's currently unclear to me if it can and should be usable in regression contexts. Will explore.