scikit-learn-contrib/DESlib

Kfold - TimeSeriesSplit?

jmrichardson opened this issue · 1 comments

Hi, thank you for the great package. I have temporal data and would like to be able to use timeseriessplit cross validation or perhaps kfold (hold the shuffle). Is this possible?

Hello,

Yes it is possible. In the case you can use the TimeSeriesSplit from sklearn to create your training and test split (and possibly validation too) and use these sets manually to train fit the base models & DS methods.

Another alternative is to have the DS method as input to the the cross_val_score function from scikit-learn to automatically compute the result over multiple folds. That functionality however, has a problem that it requires the pool of classifiers to be generated inside the DS method, instead of having a pool that you may already have trained before. That is a limitation of the scikit-learn cloning process, which cannot clone already trained models (See issue #89 ). They already have a plan to solve this issue on future updates.