SoftwareAG/nyoka

Categorical Feature LGBM supported?

Sathrovarr opened this issue · 2 comments

I'm running lightgbm classifier lightgbm.LGBMClassifier, including setting categorical features as model.fit(X, y, categorical_feature = ccols).

When I attempt exporting the resulting model using nyoka's lgb_to_pmml, I get the following error

python3.9/site-packages/nyoka/lgbm/lgb_to_pmml.py", line 338, in create_left_node
operator=SIMPLE_PREDICATE_OPERATOR.LESS_OR_EQUAL, value="{:.16f}".format(obj['threshold'])))

I've double checked the categoric_values that's passed around between the library functions are set up correctly. However, I would not see anywhere where these would be taken into account (?). It appears to me that regardless, the library tries to create a <= and > node around the value, which it wants to interpret as {.16f} indeed. The categoricals that we provide to the model are cast to int on our side, so this generally works, except that the LGBM in question apprently produces 1395||1401||1427||1496||1504||1510||1521 as the threshold value, where nyoka's float 'cast' fails.

As far as I can tell, this an expected threshold value for LGBMClassifier, which I would expect to be interpreted as SimpleSetPredicate in the PMML. While I did find implementations of the primitives in nyoka's PMML44.py and PMML44Super.py, I could not find any way this could be conceivably called from lgb_to_pmml either.

None of the examples given for lgbm seem to include categorical features either (https://github.com/SoftwareAG/nyoka/tree/master/examples/lgbm).

So I'm quite at a loss as towhat I may be missing at this point, or whether categorical columns are not supported.

I'm using nyoka '5.0.1', and lightgbm '3.2.1'.

Hi @Sathrovarr, support for categorical feature is not added in Nyoka yet. We will try to add this along with others in the pipeline in near future. Thanks!

Hello @Sathrovarr,
In future roadmap, we do have plans to implement it as a part of Nyoka. Currently closing the ticket.
Thanks