The accuracy of random forest is over 0.99
mk123qwe opened this issue · 1 comments
I fit the easy random forest model,just like this
from sklearn.ensemble import RandomForestClassifier
RandomForestClassifier(n_estimators=10, random_state=2019)
TABLE IV. High-level features in yours paper
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 16 out of 16 | elapsed: 3.2min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 16 out of 16 | elapsed: 1.2s finished
Validation Accuracy: 0.996
How much of the training data did you use?
I tried just using 1 file (~20k events, so granted it might not be enough), and only got up to ~80% test accuracy. On the other hand, if I use the training data, then it's >99% accuracy.
How is the validation accuracy defined here?
My code here: https://github.com/jmduarte/HiggsToBBMachineLearning/blob/randomforest/train.ipynb
Binder link: https://mybinder.org/v2/gh/jmduarte/HiggsToBBMachineLearning/randomforest?filepath=train.ipynb
Thanks,
Javier