Inconsistent performance of DS methods
jayahm opened this issue · 4 comments
Hi,
I run some experiments on multiple datasets using several DS methods.
I just got confused that why the performance of each DS methods is not consistent?
For example, sometimes rank X and sometimes rank Y, sometimes rank Z (if the best, not always the best om all datasets).
This made m hard to make a conclusion.
Is this normal?
Yes, it is normal. It is the non-free lunch theorem, the best model will depend on the dataset.
That's why having the appropriate simulation like using cross-validation to estimate performance averages, as well as using proper statistical tests for comparison between multiple machine learning models, is very important.
Can you explain more on the cross-validation part and statistical test?
I mean, not on how to do it. But, on how these two can be helpful for analysis in the case the performance is not consistent?
Unfortunately, I can't since it is a very long subject, with plenty of nuances to cover, and here is not the place for that (especially since it is also completely out of the scope from this project). I can however suggest some readings:
-
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning: https://arxiv.org/pdf/1811.12808
-
Statistical comparisons of classifiers over multiple data sets: http://www.jmlr.org/papers/volume7/demsar06a/demsar06a.pdf
-
Japkowicz, Nathalie, and Mohak Shah. Evaluating learning algorithms: a classification perspective. Cambridge University Press, 2011.
Thanks! I appreciate that