elephaint/pgbm

An error with PGBM

flippercy opened this issue · 8 comments

Hi @elephaint:

I got the following error when using the sklearn wrapper, PGBMRegressor:

~/.local/lib/python3.7/site-packages/pgbm/pgbm.py in _predict_tree(self, X, mu, variance, estimator)
401 # Choose next node (based on breadth-first)
402 condition = (nodes_predict >= node) * (predictions == 0)
--> 403 node = nodes_predict[condition].min()
404 # Select current node information
405 split_node = nodes_predict == node

RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

Any insight? It is not due to the data because I used the same data to build a PGBM model successfully before. Is it due to some hyperparameters? I got this error when trying to do HPO for PGBM using FLAML (https://github.com/microsoft/FLAML) and the search space I used is:

'max_bin': {'domain': tune.loguniform(lower=32, upper=32767), 'init_value': 256, 'low_cost_init_value': 256},
'max_leaves': {'domain': tune.uniform(lower=16, upper=128), 'init_value': 64},
'n_estimators': {'domain': tune.uniform(lower = 50, upper = 500), 'init_value': 200, 'low_cost_init_value': 200},
'min_data_in_leaf': {'domain': tune.uniform(lower = 1, upper = 1000), 'init_value': 100, 'low_cost_init_value': 100},
'bagging_fraction': {'domain': tune.uniform(lower = 0.6, upper = 1), 'init_value': 0.7, 'low_cost_init_value': 0.7},
'feature_fraction': {'domain': tune.uniform(lower = 0.5, upper = 1), 'init_value': 0.9, 'low_cost_init_value': 0.9},
'learning_rate': {'domain': tune.loguniform(lower = 0.001, upper = 1), 'init_value': 0.1, 'low_cost_init_value': 0.1},
'min_split_gain': {'domain': tune.loguniform(lower = 0.000000000001, upper = 0.001), 'init_value': 0.00001, 'low_cost_init_value': 0.00001},

Thank you.

@elephaint:

Thank you for the reply; your guess is correct - after I excluded min_data_in_leaf from the search space it ran well.

What is the ideal range of this parameter? In your example it was set as 1. I got the error aforementioned with it equals 100 and my dataset has 40k+ records.

Thank you for the insights! However, with my dataset (40k+ records), do you know why PGBM encountered the error I got above when min_data_in_leaf = 100? It seems that nothing returned for nodes_predict[condition].

Best,

Hi,

I don't know yet. I'll try to reproduce today.

Hi,

Sorry for the late reply. So I have been trying to reproduce, but I can't reproduce, unfortunately. I did completely rewrite the code for the PyTorch version (for speedup reasons but also to make it more robust, where possible), but I feel it needs a bit more checks before I am ready to push it. I'd be eager to find out if your problem still persists with the next version, as I did rewrite some of the tree building code that might produce the error you stumbled upon. I should be able to push that somewhere in the next 7-14 days.

Hi,

I've released a new version (1.4) that should solve the issues you faced; the code has been completely reworked and the bugs you experienced should not be possible anymore.

Hi @elephaint:

Thanks a lot! I will update the library and check.

Best,