Some minor differences in random forest implementations
tecosaur opened this issue · 0 comments
tecosaur commented
I've been comparing some random forest implementations recently (https://github.com/tecosaur/TreeComparison), one of the results of which is #159, but I also have some other information which may be of interest.
For starters, here's the colour coding I use:
Error rates mostly converged among the different implementations I tested, however sometimes ranger does a little bit better:
Precision-recall and ROC curves generally look near-identical, as they should.
I've also noticed some larger differences in the depth and size of the random trees created. Across a number of datasets DecisionTrees.jl and randomForest produce narrower/deeper trees than ranger and sklearn.