Error converting mode output txt to PMML
TGalaxy opened this issue · 6 comments
Got the following error when converting txt to PMML
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 122, Size: 1
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.jpmml.lightgbm.GBDT.encodeSchema(
at org.jpmml.lightgbm.GBDT.encodePMML(
at org.jpmml.lightgbm.Main.main(
For security reason I couldn't attach the model txt file. But could you explain what the error means? Trying to see if I can give you a toy example
But could you explain what the error means?
It means that your LightGBM model text file is internally inconsistent - there is a hint that some attribute should contain at least 123 elements, but the parser only finds a single element.
As the exception happens during schema parsing, then I believe there's something wrong with the specification of categorical columns.
For security reason I couldn't attach the model txt file
Then you need to debug this issue locally.
Trying to see if I can give you a toy example
Keeping this issue open for a couple of days. If I don't see a reproducible example during that timeframe, then I'll close it as "invalid".
As the exception happens during schema parsing, then I believe there's something wrong with the specification of categorical columns.
One shouldn't be working with LightGBM model text files directly.
I believe this exception would be avoided if you interacted with LightGBM using some high-level framework such as Scikit-Learn, which takes care of feature engineering and specification needs.
Thanks for your reply. I was trying to created a toy example, i.e., selected a few features from the original data including the categorical feature. It works smoothly. However it still does not work with all the features.
Here is my code:
d_train = lgb.Dataset(train[feature_list], label=train.tag,categorical_feature=categorical_feature)
d_validation = lgb.Dataset(validation[feature_list],label=validation.tag,categorical_feature=categorical_feature)
model = lgb.train(params, d_train, valid_sets=d_validation, early_stopping_rounds=50, verbose_eval=100)
model.save_model('lgbm.txt', num_iteration=model.best_iteration)
I will force all the other features (other than categorical) to be float and run it again.
I forced categorical features to be type category and others to be float64. However, I still got the same error
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 119, Size: 69
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at org.jpmml.lightgbm.GBDT.encodeSchema(
at org.jpmml.lightgbm.GBDT.encodePMML(
at org.jpmml.lightgbm.Main.main(
Your feature specification code is wrong. However, it's impossible for me to be any specific, because the posted exception stack trace(s) do not contain enough actionable information.
Closing as invalid/not reproducible.
@vruusmann Hello, I also encounter this problem. And I count the pandas_categorical number is right, but when convert, it also out of bounds.where could the redundant number from?