SelfExplainML/PiML-Toolbox

XGB2Regressor vs XGBRegressor

srbPhy opened this issue · 3 comments

Hi, it seems a model trained using XGB2Regressor is slightly different than the one obtained using regular XGBRegressor. For instance, if we run the following code, I get slightly different predictions for the test data. I am sure I am missing something, but I am unable to figure it out. Could you please help?

from piml import Experiment
from piml.models import XGB2Regressor
from xgboost import XGBRegressor

exp = Experiment(highcode_only=True)
exp.data_loader(data='BikeSharing', silent=True)
exp.data_prepare(target='cnt', task_type='regression', test_ratio=0.2, random_state=0, silent=True)

model1 = XGB2Regressor()
exp.model_train(model=model1, name='XGB2')

model2 = XGBRegressor(max_depth=2)
exp.model_train(model=model2, name='XGB2-default')

print(model1.predict(exp.get_data(test=True)[0]))
print(model2.predict(exp.get_data(test=True)[0]))
[-0.04393188  0.03837352  0.4268577  ...  0.02106261 -0.00260242
  0.34881094]
[-0.03740007  0.03996139  0.42402536 ...  0.02290548  0.0015662
  0.3511871 ]

Confirmed I also have the same result

[-0.04393188  0.03837352  0.4268577  ...  0.02106261 -0.00260242
  0.34881094]
[-0.03740007  0.03996139  0.42402536 ...  0.02290548  0.0015662
  0.3511871 ]

Hi @yodiaditya and @srbPhy

The results difference is due to the use of different default hyperparameters.

You would get the same results using the following codes.

from piml import Experiment
from piml.models import XGB2Regressor
from xgboost import XGBRegressor

exp = Experiment(highcode_only=True)
exp.data_loader(data='BikeSharing', silent=True)
exp.data_prepare(target='cnt', task_type='regression', test_ratio=0.2, random_state=0, silent=True)

model1 = XGB2Regressor()
exp.model_train(model=model1, name='XGB2')

params = exp.get_model("XGB2").estimator.estimator_.get_params()
model2 = XGBRegressor(**params)
exp.model_train(model=model2, name='XGB2-default')

print(model1.predict(exp.get_data(test=True)[0]))
print(model2.predict(exp.get_data(test=True)[0]))

Thank you very much for your quick response. That makes sense.