nyoka.xgboost.xgboost_to_pmml is taking too long time for a model with 5000epochs ,like 50M

Question

nyoka.xgboost.xgboost_to_pmml is taking too long time for a model with 5000epochs ,like 50M

wentixiaogege opened this issue 3 years ago · 7 comments

wentixiaogege commented 3 years ago

Answer 1 · 2021-05-18T07:19:38.000Z

to me ,it takes about 6hours to finished, but a xgb training is only like 30mins, so this is not very acceptable!!!

Answer 2 · 2021-05-19T05:58:34.000Z

Hi @wentixiaogege , could you please provide the code to reproduce it?

Answer 3 · 2021-05-19T06:37:59.000Z

code is below，but data is private; maybe you can find some open source data

please check the running time in the picture,you can see that with only 500epcohs takes 2minutes, if using 5000epochs ,it takes many hours

from xgboost.sklearn import XGBRegressor
from sklearn.pipeline import Pipeline
eval_metric = ["rmse", "mae"]
pipeline = Pipeline([
("regressor", XGBRegressor(n_estimators=500, learning_rate=0.05, subsample=0.8, colsample_bytree = 0.8,
max_depth = 7, min_child_weight=1))
]).fit(X_train, y_train, regressor__eval_set=[(X_train, y_train)], regressor__verbose=100, regressor__eval_metric=eval_metric)

wmape(pipeline.predict(X_test), y_test)

Answer 4 · 2021-05-19T06:41:10.000Z

How large is your generated PMML file in size?

Answer 5 · 2021-05-19T07:08:54.000Z

the size is around 10X large than the 500 epochs, size is like 50M~ 100M, the number of string is like 10000000+ ;

i see that the size is large, but i cannot see the saving time is even longer than the training time.

Answer 6 · 2021-11-18T06:05:42.000Z

Hi @wentixiaogege, sorry for the delay in responding back. We are now looking into the issue. We will include this fix in our next release soon.

PR - #50

Answer 7 · 2021-11-24T09:03:49.000Z

Hi @wentixiaogege, release 5.1.0 takes care of this. Thanks!