Some minor differences in random forest implementations

Question

Some minor differences in random forest implementations

tecosaur opened this issue 3 years ago · 0 comments

I've been comparing some random forest implementations recently (https://github.com/tecosaur/TreeComparison), one of the results of which is #159, but I also have some other information which may be of interest.

For starters, here's the colour coding I use:

Error rates mostly converged among the different implementations I tested, however sometimes ranger does a little bit better:

Precision-recall and ROC curves generally look near-identical, as they should.

I've also noticed some larger differences in the depth and size of the random trees created. Across a number of datasets DecisionTrees.jl and randomForest produce narrower/deeper trees than ranger and sklearn.