SoftwareAG/nyoka

nyoka.xgboost.xgboost_to_pmml is taking too long time for a model with 5000epochs ,like 50M

wentixiaogege opened this issue · 7 comments

nyoka.xgboost.xgboost_to_pmml is taking too long time for a model with 5000epochs ,like 50M

to me ,it takes about 6hours to finished, but a xgb training is only like 30mins, so this is not very acceptable!!!

Hi @wentixiaogege , could you please provide the code to reproduce it?

image

code is below,but data is private; maybe you can find some open source data

please check the running time in the picture,you can see that with only 500epcohs takes 2minutes, if using 5000epochs ,it takes many hours

from xgboost.sklearn import XGBRegressor
from sklearn.pipeline import Pipeline
eval_metric = ["rmse", "mae"]
pipeline = Pipeline([
("regressor", XGBRegressor(n_estimators=500, learning_rate=0.05, subsample=0.8, colsample_bytree = 0.8,
max_depth = 7, min_child_weight=1))
]).fit(X_train, y_train, regressor__eval_set=[(X_train, y_train)], regressor__verbose=100, regressor__eval_metric=eval_metric)

wmape(pipeline.predict(X_test), y_test)

How large is your generated PMML file in size?

the size is around 10X large than the 500 epochs, size is like 50M~ 100M, the number of string is like 10000000+ ;

i see that the size is large, but i cannot see the saving time is even longer than the training time.

Hi @wentixiaogege, sorry for the delay in responding back. We are now looking into the issue. We will include this fix in our next release soon.

PR - #50

Hi @wentixiaogege, release 5.1.0 takes care of this. Thanks!