FoldingClassifier: KFold vs StratifiedKFold
jonas-eschle opened this issue · 5 comments
Hey,
first of all a compliment: I really like your repo and I build a lot of code on it, it's so useful!
About the FoldingClassifier: There was already a request to implement the StratifiedKFolding additionally to the "normal" KFolding. I would be very glad to see this but I'd even go a step further: why don't you completely replace the KFold with a StratifiedKFold?
I think, from an ML point of view, it is always better (or, in best case, equally good) to use a stratified one. Using a normal KFolding only introduces different class-balances which (usually) result in "shifted" probabilities among the different classifier, whereas a stratified one does not and therefore makes each trained classifiers predictions "comparable".
Or in other words: I cannot think of any case where you want to have a non-stratified KFolding instead of a stratified one.
What do you think?
Best,
Mayou
Hi Mayou, thanks for your feedback!
From statistical point I'm completely for using stratified kFold, it's an implementation problem. Stratified kFold requires some magic with remembering target of original dataset, which is somewhat I don't want to be default behavior.
Simple kFold splitting requires only length of dataset (one number) and random state (one number). With fixed random state splitting doesn't depend on the dataset, but situation changes in the stratified case - splitting should depend on the dataset labels, thus this information should be somehow kept inside a classifier (we shouldn't ask for labels during prediction, otherwise we break interface). In rare (but still existing) situations this drives to spending much more memory.
So I'd like it to be an additional option for FoldingClassifier, but not default behavior. Let me see what I can do for this.
@mayou36 I've added test implementation for stratified kFolding, you can try it now:
pip uninstall rep
pip install https://github.com/yandex/rep/archive/stratifiedkfold.zip --no-dependencies
Please report if you face any bugs
Thank you very much, checked it out and it looks good so far.
Sure, I'll let you know
Just an update:
I've been using the stratified classifier without any problems; no bugs considered so far.
@mayou36
Nice to know, then we'll have it in the next release.
I'll close the issue and close PR.