scikit-learn-contrib/DESlib

Inconsistent performance of DS methods

jayahm opened this issue · 4 comments

Hi,

I run some experiments on multiple datasets using several DS methods.

I just got confused that why the performance of each DS methods is not consistent?

For example, sometimes rank X and sometimes rank Y, sometimes rank Z (if the best, not always the best om all datasets).

This made m hard to make a conclusion.

Is this normal?

Yes, it is normal. It is the non-free lunch theorem, the best model will depend on the dataset.

That's why having the appropriate simulation like using cross-validation to estimate performance averages, as well as using proper statistical tests for comparison between multiple machine learning models, is very important.

Can you explain more on the cross-validation part and statistical test?

I mean, not on how to do it. But, on how these two can be helpful for analysis in the case the performance is not consistent?

Unfortunately, I can't since it is a very long subject, with plenty of nuances to cover, and here is not the place for that (especially since it is also completely out of the scope from this project). I can however suggest some readings:

Thanks! I appreciate that