Can't convert LGBMClassifier with daal4py
GLmontanari opened this issue · 11 comments
Describe the bug
I am trying to accelerate a LGBMClassifier model on CPU but I can't. I have tried different ways:
To Reproduce
This is my code
import lightgbm as lgb
import daal4py as d4p
# Then I have a section where I load the data
# ...
# This is not working
lgb_model = lgb.LGBMClassifier(num_leaves=31).fit(x_train, y_train)
daal_model = d4p.get_gbt_model_from_lightgbm(lgb_model) # AttributeError: 'LGBMClassifier' object has no attribute 'dump_model'
# I don't know the difference (yet), but this is working
lgb_model = lgb.train({'num_leaves': 31}, lgb.Dataset(x_train, y_train))
daal_model = d4p.get_gbt_model_from_lightgbm(lgb_model)
# Unfortunately, when I test on the test set, this method returns
# 1: float labels instead of integers (I want to do multiclass classification)
# 2: the results are different from the non-optimized LGBMClassifier, in particular they are worse
Output/Screenshots
Upon converting the model with this line d4p.get_gbt_model_from_lightgbm(lgb_model)
I get an exception:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2024.1.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1535, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\JetBrains\PyCharm Community Edition 2024.1.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:\MiniProgetti\SEA_ML_BENCHMARKS\sklearn_benchmarks\TestIntel.py", line 90, in <module>
d4p.get_gbt_model_from_lightgbm(lgb_model) # AttributeError: 'LGBMClassifier' object has no attribute 'dump_model'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "build\\daal4py_cy.pyx", line 21558, in daal4py._daal4py.get_gbt_model_from_lightgbm
AttributeError: 'LGBMClassifier' object has no attribute 'dump_model'
I have tried also daal4py 2023.2.0 and daal4py 2023.2.1 and get a different exception:
Traceback (most recent call last):
File "C:\Users\iicgym\AppData\Local\Programs\Python\Python311\Lib\timeit.py", line 180, in timeit
timing = self.inner(it, self.timer)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<timeit-src>", line 7, in inner
File "build\\daal4py_cy.pyx", line 21576, in daal4py._daal4py.get_gbt_model_from_lightgbm
File "build\\daal4py_cy.pyx", line 21445, in daal4py._daal4py.TreeList.from_lightgbm_booster_dump
File "build\\daal4py_cy.pyx", line 21353, in daal4py._daal4py.Node.from_lightgbm_dict
KeyError: 'leaf_count'
python-BaseException
Environment:
- OS: Windows10
- Python 3.11.9
- daal4py 2024.5.0
- sklearn-intelex 2024.5.0
- lightgbm 4.5.0 (tried also 4.3.0)
Hi! Thanks for the report, we'll investigate it.
Hi @GLmontanari ,
thanks for using daal4py!
The root of the problem is that the API you chose doesn't support scikit-learn estimators like LGBMClassifier
. To resolve this, you need to switch to a newer version of the API. Please take a look at the example or the paper.
I am sharing with you an excerpt of pip list
...
daal 2024.5.0
daal4py 2024.5.0
...
dpcpp-cpp-rt 2024.2.0
dpctl 0.17.0
...
fsspec 2024.2.0
...
intel-cmplr-lib-rt 2024.2.0
intel-cmplr-lib-ur 2024.2.0
intel-cmplr-lic-rt 2024.2.0
intel-opencl-rt 2024.2.0
intel-openmp 2024.2.0
intel-sycl-rt 2024.2.0
...
mkl 2024.2.0
scikit-learn 1.4.2
scikit-learn-intelex 2024.5.0
tbb 2021.13.0
tifffile 2024.5.10
tzdata 2024.1
I am using the 2024.2.0 version. Do I need a newer version???
By the way, with the example you posted I get the same error messages.
I am sharing with you an excerpt of
pip list
... daal 2024.5.0 daal4py 2024.5.0 ... dpcpp-cpp-rt 2024.2.0 dpctl 0.17.0 ... fsspec 2024.2.0 ... intel-cmplr-lib-rt 2024.2.0 intel-cmplr-lib-ur 2024.2.0 intel-cmplr-lic-rt 2024.2.0 intel-opencl-rt 2024.2.0 intel-openmp 2024.2.0 intel-sycl-rt 2024.2.0 ... mkl 2024.2.0 scikit-learn 1.4.2 scikit-learn-intelex 2024.5.0 tbb 2021.13.0 tifffile 2024.5.10 tzdata 2024.1
I am using the 2024.2.0 version. Do I need a newer version???
No, you need to use another version of API.
just replace
daal_model = d4p.get_gbt_model_from_lightgbm(lgb_model)
from your code to
daal_model = d4p.mb.convert_model(lgb_model)
latest version not working either. just checked. It's the 2024.6.0
Hi @GLmontanari , was you able solve the problem with another variant of user-API?
Of course I tried daal_model = d4p.mb.convert_model(lgb_model)
and it's not working as well.
Of course I tried
daal_model = d4p.mb.convert_model(lgb_model)
and it's not working as well.
Could you please share the minimal model with which you are experiencing the problem?
Here is my code, copied from the example:
lgb_model = lgb.LGBMClassifier().fit(x_train, y_train)
lgb_pred = lgb_model.predict(x_test)
daal_model = d4p.mb.convert_model(lgb_model)
daal_model.predict(x_test)
Following is the error message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:299, in convert_model(model)
298 gbm = GBTDAALModel()
--> 299 gbm._convert_model(model)
300 except TypeError as err:
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:114, in GBTDAALBaseModel._convert_model(self, model)
113 else:
--> 114 raise TypeError(
115 f"Only GBTDAALClassifier can be created from\
116 {submodule_name}.{class_name} (got {self_class_name})"
117 )
118 # Build GBTDAALClassifier from XGBoost
TypeError: Only GBTDAALClassifier can be created from lightgbm.sklearn.LGBMClassifier (got GBTDAALModel)
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
Cell In[16], line 4
2 lgb_model = lgb.LGBMClassifier(verbose=0).fit(x_train, y_train)
3 lgb_pred = lgb_model.predict(x_test)
----> 4 daal_model = d4p.mb.convert_model(lgb_model)
5 daal_model.predict(x_test)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:304, in convert_model(model)
302 gbm = d4p.sklearn.ensemble.GBTDAALRegressor.convert_model(model)
303 elif "Only GBTDAALClassifier can be created" in str(err):
--> 304 gbm = d4p.sklearn.ensemble.GBTDAALClassifier.convert_model(model)
305 else:
306 raise
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\sklearn\ensemble\GBTDAAL.py:240, in GBTDAALClassifier.convert_model(model)
237 @staticmethod
238 def convert_model(model):
239 gbm = GBTDAALClassifier()
--> 240 gbm._convert_model(model)
242 gbm.classes_ = model.classes_
243 gbm.allow_nan_ = True
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:112, in GBTDAALBaseModel._convert_model(self, model)
110 if (submodule_name, class_name) == ("lightgbm.sklearn", "LGBMClassifier"):
111 if self_class_name == "GBTDAALClassifier":
--> 112 self._convert_model_from_lightgbm(model.booster_)
113 else:
114 raise TypeError(
115 f"Only GBTDAALClassifier can be created from\
116 {submodule_name}.{class_name} (got {self_class_name})"
117 )
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:89, in GBTDAALBaseModel._convert_model_from_lightgbm(self, booster)
87 def _convert_model_from_lightgbm(self, booster):
88 lgbm_params = d4p.get_lightgbm_params(booster)
---> 89 self.daal_model_ = d4p.get_gbt_model_from_lightgbm(booster, lgbm_params)
90 self._get_params_from_lightgbm(lgbm_params)
File build\\daal4py_cy.pyx:21576, in daal4py._daal4py.get_gbt_model_from_lightgbm()
File build\\daal4py_cy.pyx:21445, in daal4py._daal4py.TreeList.from_lightgbm_booster_dump()
File build\\daal4py_cy.pyx:21353, in daal4py._daal4py.Node.from_lightgbm_dict()
KeyError: 'leaf_count'
Wait. I got your point. This is working:
lgb_train = lgb.Dataset(x_train, y_train, free_raw_data=False)
params = {
"max_bin": 256,
"scale_pos_weight": 2,
"lambda_l2": 1,
"alpha": 0.9,
"max_depth": 6,
"num_leaves": 2**6,
"verbose": -1,
"objective": "multiclass",
"learning_rate": 0.3,
"num_class": 11,
"n_estimators": 25,
}
lgb_model = lgb.train(params, lgb_train, valid_sets=lgb_train, callbacks=[lgb.log_evaluation(0)])
daal_model = d4p.mb.convert_model(lgb_model)
Ok. So I was not understanding that LGBMClassifier
is not supported. I can get the same result by using lgb.train()
and specifying, for my case, objective='multiclass'
.
Thank you!