jpmml/jpmml-lightgbm

Error converting mode output txt to PMML

TGalaxy opened this issue · 6 comments

Got the following error when converting txt to PMML

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 122, Size: 1
	at java.util.ArrayList.rangeCheck(Unknown Source)
	at java.util.ArrayList.get(Unknown Source)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:132)
	at org.jpmml.lightgbm.Main.main(Main.java:118)

For security reason I couldn't attach the model txt file. But could you explain what the error means? Trying to see if I can give you a toy example

But could you explain what the error means?

It means that your LightGBM model text file is internally inconsistent - there is a hint that some attribute should contain at least 123 elements, but the parser only finds a single element.

As the exception happens during schema parsing, then I believe there's something wrong with the specification of categorical columns.

For security reason I couldn't attach the model txt file

Then you need to debug this issue locally.

Trying to see if I can give you a toy example

Keeping this issue open for a couple of days. If I don't see a reproducible example during that timeframe, then I'll close it as "invalid".

As the exception happens during schema parsing, then I believe there's something wrong with the specification of categorical columns.

One shouldn't be working with LightGBM model text files directly.

I believe this exception would be avoided if you interacted with LightGBM using some high-level framework such as Scikit-Learn, which takes care of feature engineering and specification needs.

See https://openscoring.io/blog/2019/04/07/converting_sklearn_lightgbm_pipeline_pmml/

Thanks for your reply. I was trying to created a toy example, i.e., selected a few features from the original data including the categorical feature. It works smoothly. However it still does not work with all the features.

Here is my code:

d_train = lgb.Dataset(train[feature_list], label=train.tag,categorical_feature=categorical_feature)
d_validation = lgb.Dataset(validation[feature_list],label=validation.tag,categorical_feature=categorical_feature)

model = lgb.train(params, d_train, valid_sets=d_validation, early_stopping_rounds=50, verbose_eval=100)
model.save_model('lgbm.txt', num_iteration=model.best_iteration)

I will force all the other features (other than categorical) to be float and run it again.

I forced categorical features to be type category and others to be float64. However, I still got the same error

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 119, Size: 69
	at java.util.ArrayList.rangeCheck(Unknown Source)
	at java.util.ArrayList.get(Unknown Source)
	at org.jpmml.lightgbm.GBDT.encodeSchema(GBDT.java:233)
	at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:384)
	at org.jpmml.lightgbm.Main.run(Main.java:132)
	at org.jpmml.lightgbm.Main.main(Main.java:118)

Your feature specification code is wrong. However, it's impossible for me to be any specific, because the posted exception stack trace(s) do not contain enough actionable information.

Closing as invalid/not reproducible.

@vruusmann Hello, I also encounter this problem. And I count the pandas_categorical number is right, but when convert, it also out of bounds.where could the redundant number from?