uxlfoundation/scikit-learn-intelex

Can't convert LGBMClassifier with daal4py

GLmontanari opened this issue · 11 comments

Describe the bug
I am trying to accelerate a LGBMClassifier model on CPU but I can't. I have tried different ways:

To Reproduce
This is my code

import lightgbm as lgb
import daal4py as d4p

# Then I have a section where I load the data
# ...

# This is not working
lgb_model = lgb.LGBMClassifier(num_leaves=31).fit(x_train, y_train)
daal_model = d4p.get_gbt_model_from_lightgbm(lgb_model)  # AttributeError: 'LGBMClassifier' object has no attribute 'dump_model'

# I don't know the difference (yet), but this is working
lgb_model = lgb.train({'num_leaves': 31}, lgb.Dataset(x_train, y_train))
daal_model = d4p.get_gbt_model_from_lightgbm(lgb_model)

# Unfortunately, when I test on the test set, this method returns
# 1: float labels instead of integers (I want to do multiclass classification)
# 2: the results are different from the non-optimized LGBMClassifier, in particular they are worse

Output/Screenshots
Upon converting the model with this line d4p.get_gbt_model_from_lightgbm(lgb_model) I get an exception:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2024.1.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1535, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2024.1.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\MiniProgetti\SEA_ML_BENCHMARKS\sklearn_benchmarks\TestIntel.py", line 90, in <module>
    d4p.get_gbt_model_from_lightgbm(lgb_model)  # AttributeError: 'LGBMClassifier' object has no attribute 'dump_model'
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "build\\daal4py_cy.pyx", line 21558, in daal4py._daal4py.get_gbt_model_from_lightgbm
AttributeError: 'LGBMClassifier' object has no attribute 'dump_model'

I have tried also daal4py 2023.2.0 and daal4py 2023.2.1 and get a different exception:

Traceback (most recent call last):
  File "C:\Users\iicgym\AppData\Local\Programs\Python\Python311\Lib\timeit.py", line 180, in timeit
    timing = self.inner(it, self.timer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<timeit-src>", line 7, in inner
  File "build\\daal4py_cy.pyx", line 21576, in daal4py._daal4py.get_gbt_model_from_lightgbm
  File "build\\daal4py_cy.pyx", line 21445, in daal4py._daal4py.TreeList.from_lightgbm_booster_dump
  File "build\\daal4py_cy.pyx", line 21353, in daal4py._daal4py.Node.from_lightgbm_dict
KeyError: 'leaf_count'
python-BaseException

Environment:

  • OS: Windows10
  • Python 3.11.9
  • daal4py 2024.5.0
  • sklearn-intelex 2024.5.0
  • lightgbm 4.5.0 (tried also 4.3.0)

Hi! Thanks for the report, we'll investigate it.

Hi @GLmontanari ,
thanks for using daal4py!

The root of the problem is that the API you chose doesn't support scikit-learn estimators like LGBMClassifier. To resolve this, you need to switch to a newer version of the API. Please take a look at the example or the paper.

I am sharing with you an excerpt of pip list

...
daal                      2024.5.0
daal4py                   2024.5.0
...
dpcpp-cpp-rt              2024.2.0
dpctl                     0.17.0
...
fsspec                    2024.2.0
...
intel-cmplr-lib-rt        2024.2.0
intel-cmplr-lib-ur        2024.2.0
intel-cmplr-lic-rt        2024.2.0
intel-opencl-rt           2024.2.0
intel-openmp              2024.2.0
intel-sycl-rt             2024.2.0
...
mkl                       2024.2.0
scikit-learn              1.4.2
scikit-learn-intelex      2024.5.0
tbb                       2021.13.0
tifffile                  2024.5.10
tzdata                    2024.1

I am using the 2024.2.0 version. Do I need a newer version???
By the way, with the example you posted I get the same error messages.

I am sharing with you an excerpt of pip list

...
daal                      2024.5.0
daal4py                   2024.5.0
...
dpcpp-cpp-rt              2024.2.0
dpctl                     0.17.0
...
fsspec                    2024.2.0
...
intel-cmplr-lib-rt        2024.2.0
intel-cmplr-lib-ur        2024.2.0
intel-cmplr-lic-rt        2024.2.0
intel-opencl-rt           2024.2.0
intel-openmp              2024.2.0
intel-sycl-rt             2024.2.0
...
mkl                       2024.2.0
scikit-learn              1.4.2
scikit-learn-intelex      2024.5.0
tbb                       2021.13.0
tifffile                  2024.5.10
tzdata                    2024.1

I am using the 2024.2.0 version. Do I need a newer version???
No, you need to use another version of API.
just replace
daal_model = d4p.get_gbt_model_from_lightgbm(lgb_model)
from your code to
daal_model = d4p.mb.convert_model(lgb_model)

latest version not working either. just checked. It's the 2024.6.0

Hi @GLmontanari , was you able solve the problem with another variant of user-API?

Of course I tried daal_model = d4p.mb.convert_model(lgb_model) and it's not working as well.

Of course I tried daal_model = d4p.mb.convert_model(lgb_model) and it's not working as well.

Could you please share the minimal model with which you are experiencing the problem?

Here is my code, copied from the example:

lgb_model = lgb.LGBMClassifier().fit(x_train, y_train)
lgb_pred = lgb_model.predict(x_test)
daal_model = d4p.mb.convert_model(lgb_model)
daal_model.predict(x_test)

Following is the error message

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:299, in convert_model(model)
    298     gbm = GBTDAALModel()
--> 299     gbm._convert_model(model)
    300 except TypeError as err:

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:114, in GBTDAALBaseModel._convert_model(self, model)
    113     else:
--> 114         raise TypeError(
    115             f"Only GBTDAALClassifier can be created from\
    116                          {submodule_name}.{class_name} (got {self_class_name})"
    117         )
    118 # Build GBTDAALClassifier from XGBoost

TypeError: Only GBTDAALClassifier can be created from                                 lightgbm.sklearn.LGBMClassifier (got GBTDAALModel)

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
Cell In[16], line 4
      2 lgb_model = lgb.LGBMClassifier(verbose=0).fit(x_train, y_train)
      3 lgb_pred = lgb_model.predict(x_test)
----> 4 daal_model = d4p.mb.convert_model(lgb_model)
      5 daal_model.predict(x_test)

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:304, in convert_model(model)
    302     gbm = d4p.sklearn.ensemble.GBTDAALRegressor.convert_model(model)
    303 elif "Only GBTDAALClassifier can be created" in str(err):
--> 304     gbm = d4p.sklearn.ensemble.GBTDAALClassifier.convert_model(model)
    305 else:
    306     raise

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\sklearn\ensemble\GBTDAAL.py:240, in GBTDAALClassifier.convert_model(model)
    237 @staticmethod
    238 def convert_model(model):
    239     gbm = GBTDAALClassifier()
--> 240     gbm._convert_model(model)
    242     gbm.classes_ = model.classes_
    243     gbm.allow_nan_ = True

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:112, in GBTDAALBaseModel._convert_model(self, model)
    110 if (submodule_name, class_name) == ("lightgbm.sklearn", "LGBMClassifier"):
    111     if self_class_name == "GBTDAALClassifier":
--> 112         self._convert_model_from_lightgbm(model.booster_)
    113     else:
    114         raise TypeError(
    115             f"Only GBTDAALClassifier can be created from\
    116                          {submodule_name}.{class_name} (got {self_class_name})"
    117         )

File ~\AppData\Local\Programs\Python\Python311\Lib\site-packages\daal4py\mb\model_builders.py:89, in GBTDAALBaseModel._convert_model_from_lightgbm(self, booster)
     87 def _convert_model_from_lightgbm(self, booster):
     88     lgbm_params = d4p.get_lightgbm_params(booster)
---> 89     self.daal_model_ = d4p.get_gbt_model_from_lightgbm(booster, lgbm_params)
     90     self._get_params_from_lightgbm(lgbm_params)

File build\\daal4py_cy.pyx:21576, in daal4py._daal4py.get_gbt_model_from_lightgbm()

File build\\daal4py_cy.pyx:21445, in daal4py._daal4py.TreeList.from_lightgbm_booster_dump()

File build\\daal4py_cy.pyx:21353, in daal4py._daal4py.Node.from_lightgbm_dict()

KeyError: 'leaf_count'

Wait. I got your point. This is working:

lgb_train = lgb.Dataset(x_train, y_train, free_raw_data=False)

params = {
    "max_bin": 256,
    "scale_pos_weight": 2,
    "lambda_l2": 1,
    "alpha": 0.9,
    "max_depth": 6,
    "num_leaves": 2**6,
    "verbose": -1,
    "objective": "multiclass",
    "learning_rate": 0.3,
    "num_class": 11,
    "n_estimators": 25,
}

lgb_model = lgb.train(params, lgb_train, valid_sets=lgb_train, callbacks=[lgb.log_evaluation(0)])
daal_model = d4p.mb.convert_model(lgb_model)

Ok. So I was not understanding that LGBMClassifier is not supported. I can get the same result by using lgb.train() and specifying, for my case, objective='multiclass'.
Thank you!