IOOBE when parsing LightGBM model files
xiaoxingbai opened this issue · 9 comments
java.lang.IndexOutOfBoundsException: Index 3 out of bounds for length 1
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
at java.base/java.util.Objects.checkIndex(Objects.java:359)
at java.base/java.util.ArrayList.get(ArrayList.java:427)
at org.jpmml.lightgbm.Tree.selectValues(Tree.java:319)
at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:214)
at org.jpmml.lightgbm.Tree.encodeNode(Tree.java:267)
at org.jpmml.lightgbm.Tree.encodeTreeModel(Tree.java:110)
at org.jpmml.lightgbm.ObjectiveFunction.createMiningModel(ObjectiveFunction.java:74)
at org.jpmml.lightgbm.BinomialLogisticRegression.encodeModel(BinomialLogisticRegression.java:46)
at org.jpmml.lightgbm.GBDT.encodeModel(GBDT.java:417)
at org.jpmml.lightgbm.GBDT.encodeModel(GBDT.java:404)
at org.jpmml.lightgbm.GBDT.encodePMML(GBDT.java:392)
at org.jpmml.lightgbm.example.Main.run(Main.java:175)
at org.jpmml.lightgbm.example.Main.main(Main.java:136)
模型训练数据量大的时候正常,数据量少的话去做txt到pmml 转换的时候就会报越界的错误
Looks like the conversion was initiated using the command-line application (as opposed to using some ML framework wrapper such as SkLearn2PMML or PySpark2PMML).
The command-line application reads model schema information (number and type of features, the cardinality of categorical features, etc) from the model file. In comparison, when using ML frameworks wrappers, this model schema information would be provided by the wrapper component (eg. Scikit-Learn pipeline).
模型训练数据量大的时候正常,数据量少的话去做txt到pmml 转换的时候就会报越界的错误
I don't speak the language, sorry. Google translate tells me that the above means that this exception is raised with small datasets and not with big datasets.
if so, can you provide a small self-containing LightGBM model file that causes this IndexOutOfBoundsException
to be thrown? Can you reproduce it using my default Audit dataset?
Also, what is your LightGBM version?
I need more evidence before I can implement any code changes.
When the model training data is large, it is normal, and when the data is small, the error will be reported when the txt is converted to pmml.I trained the model directly with lightgbm, version 4.3.0
I trained the model directly with lightgbm, version 4.8.3.
In PyPI, the latest LightGBM version is only 4.4.0.
I have provided three models of lightgbm training conversion errors. Can you see what causes the errors?
Where can I see those models? They are not attached here, and they're not in my e-mail inbox.
I can't do anything without sample model files.
I trained the model directly with lightgbm, version 4.8.3
In PyPI, the latest LightGBM version is only 4.4.0.
sorry, in jupyter lightgbm version 4.3.0I have provided three models of lightgbm training conversion errors. Can you see what causes the errors?
Where can I see those models? They are not attached here, and they're not in my e-mail inbox.
I put three attachment in Email.Now I'm reattached here,So you can see it directly.
lightgbm_model_shouhuan_optimize_v6_cvr_20240711010046.txt
lightgbm_model_shouhuan_optimize_v6_cvr_20240711101909.txt
lightgbm_model_shouhuan_optimize_v6_cvr_20240711120223.txt
I can't do anything without sample model files.
Try again.
I have provided three models of lightgbm training conversion errors.
Got all three models. And they are also failing with an IOOBE when I try to convert them using the command-line application.
Will investigate soon.
This IOOBE happened when the training dataset contained constant-value categorical columns. For example, in the provided three sample models, the value of the "os" field is always "android".
The fix is included in JPMML-LightGBM version 1.5.4 and newer.
Thank you very much. I'll try later.