possible to add some settings?
leemoor opened this issue · 3 comments
1\ set default value for missing or invalid value in categorical feature? it may occurs error in system when meet values like test1=4 ( not in (0,1,2,3) )
<DataField name="test1" optype="categorical" dataType="integer">
<Value value="0"/>
<Value value="1"/>
<Value value="2"/>
<Value value="3"/>
</DataField>
2\ like below , is possible to close the margin setting? it may occurs error in system when meet values like 200 (>100)
<DataField name="test2" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="0.0" rightMargin="100.0" />
</DataField>
How do you train and export LightGBM models - in standalone mode, or using some abstraction layer (Scikit-Learn, Apache Spark ML)?
Your requirements would be easy to meet when using an abstraction layer. For example, in Scikit-Learn, it's possible to customize the definition of a feature column using sklearn2pmml.decoration.ContinuousDomain
and s.d.CategoricalDomain
pseudo-transformation classes.
A very close topic was discussed earlier today on the JPMML mailing list:
https://groups.google.com/d/msg/jpmml/10uOILNhXY8/Kro0aW4lEwAJ
How do you train and export LightGBM models - in standalone mode, or using some abstraction layer (Scikit-Learn, Apache Spark ML)?
Your requirements would be easy to meet when using an abstraction layer. For example, in Scikit-Learn, it's possible to customize the definition of a feature column using
sklearn2pmml.decoration.ContinuousDomain
ands.d.CategoricalDomain
pseudo-transformation classes.A very close topic was discussed earlier today on the JPMML mailing list:
https://groups.google.com/d/msg/jpmml/10uOILNhXY8/Kro0aW4lEwAJ
training as in standalone mode :
import lightgbm as lgb
clf = lgb.LGBMClassifier(
num_leaves=60,
max_depth=6,
learning_rate = 0.1,
min_data_in_leaf = 100,
n_estimators=500,
n_jobs=20,
bagging_fraction = 0.9
)
clf.fit(train[features], train[target],
eval_set=[(test[features], test[target])],
eval_metric= 'auc',
feature_name=features,
categorical_feature = cata_feature,
early_stopping_rounds=500
)
model_path = model_path='/Users/model.txt'
clf.booster_.save_model(model_path)
then use the jar as below
java -jar target/jpmml-lightgbm-executable-1.2-SNAPSHOT.jar --lgbm-input model.txt --pmml-output model.pmml
training as in standalone mode
By "standalone mode" I meant that perhaps you're using command-line lgbm.exe or something.
However, you appear to be using Scikit-Learn as an abstraction layer. Simply wrap your LGBMClassifier
into a sklearn2pmml.pipeline.Pipeline
, and apply sklearn2pmml.decoration.ContinuousDomain
transformers to problematic columns.
See my latest JPMML mailing list post for a complete example.