jpmml/jpmml-lightgbm

possible to add some settings?

leemoor opened this issue · 3 comments

1\ set default value for missing or invalid value in categorical feature? it may occurs error in system when meet values like test1=4 ( not in (0,1,2,3) )

<DataField name="test1" optype="categorical" dataType="integer">
	<Value value="0"/>
	<Value value="1"/>
	<Value value="2"/>
	<Value value="3"/>
</DataField>

2\ like below , is possible to close the margin setting? it may occurs error in system when meet values like 200 (>100)

<DataField name="test2" optype="continuous" dataType="double">
	<Interval closure="closedClosed" leftMargin="0.0" rightMargin="100.0" />
</DataField>

How do you train and export LightGBM models - in standalone mode, or using some abstraction layer (Scikit-Learn, Apache Spark ML)?

Your requirements would be easy to meet when using an abstraction layer. For example, in Scikit-Learn, it's possible to customize the definition of a feature column using sklearn2pmml.decoration.ContinuousDomain and s.d.CategoricalDomain pseudo-transformation classes.

A very close topic was discussed earlier today on the JPMML mailing list:
https://groups.google.com/d/msg/jpmml/10uOILNhXY8/Kro0aW4lEwAJ

How do you train and export LightGBM models - in standalone mode, or using some abstraction layer (Scikit-Learn, Apache Spark ML)?

Your requirements would be easy to meet when using an abstraction layer. For example, in Scikit-Learn, it's possible to customize the definition of a feature column using sklearn2pmml.decoration.ContinuousDomain and s.d.CategoricalDomain pseudo-transformation classes.

A very close topic was discussed earlier today on the JPMML mailing list:
https://groups.google.com/d/msg/jpmml/10uOILNhXY8/Kro0aW4lEwAJ

training as in standalone mode :

import lightgbm as lgb
clf = lgb.LGBMClassifier(
                            num_leaves=60,
                         max_depth=6,
                         learning_rate = 0.1,
                         min_data_in_leaf = 100,
                        n_estimators=500,
                        n_jobs=20,
                        bagging_fraction = 0.9 
                          )
clf.fit(train[features], train[target],
                eval_set=[(test[features], test[target])],
                eval_metric= 'auc',
                feature_name=features,
                categorical_feature = cata_feature,
                early_stopping_rounds=500
                )
model_path = model_path='/Users/model.txt'
clf.booster_.save_model(model_path)

then use the jar as below
java -jar target/jpmml-lightgbm-executable-1.2-SNAPSHOT.jar --lgbm-input model.txt --pmml-output model.pmml

training as in standalone mode

By "standalone mode" I meant that perhaps you're using command-line lgbm.exe or something.

However, you appear to be using Scikit-Learn as an abstraction layer. Simply wrap your LGBMClassifier into a sklearn2pmml.pipeline.Pipeline, and apply sklearn2pmml.decoration.ContinuousDomain transformers to problematic columns.

See my latest JPMML mailing list post for a complete example.