balajiln/mondrianforest

Bagging

Closed this issue · 5 comments

Is the bagging capability functional? I've run the code with settings.bagging=1 and settings.bagging=0 and I get the same answer both times. So I was wondering if there is a problem with the bagging?

Sorry, the bagging functionality is not supported anymore.

Is there a reason for this? Is there an easy way to re-implement it?

Bagging is necessary to de-correlate the trees only when the other randomization mechanism is not powerful enough. Breiman-RF uses bagging as it searches over all split locations amongst a subset of features whereas Extremely randomized trees (ERTs) randomly choose a split location for a subset of features and only optimize the location. Due to the additional randomization, ERTs do not use bagging (in fact, bagging hurts performance as it uses only 63.2% of unique data points at each tree). For a similar reason, I think bagging is unnecessary for MFs.

The easiest way to re-implement it would be to replace train_ids_current_minibatch with something like bootstrap(train_ids_current_minibatch) before calling fit and partial_fit.

Thanks for help.

I have another question. I have been changing some of the parameter values such as the discount factor, the budget, store_every, and smooth_hierarchically and the output of the tree remains the same. The number of leaves does not change with with each minibatch and the accuracy on the test set does not change. Is there a reason for this?

Unless you change n_minibatches or the budget, the tree size shouldn't change; Try increasing the budget gradually from 0 to n_dim --- you should see the tree size change. Discount factor, smooth_hierarchically and store_every only affect the predictions from the tree.

I'll close this issue. Email me if you have any other questions.