Support for cross-validation in scenarios
paraschakis opened this issue · 4 comments
Scenarios currently cover single train-test-validation splits. It would be nice to have a mechanism for cross-validation as well.
Hi @paraschakis,
Thanks for raising the question! Could you give us and idea of what kind of behavior you would expect when you perform cross-validation with, for example, the WeakGeneralization and StrongGeneralizationTimed scenarios?
We would also appreciate it if you could provide references to papers that have used or proposed cross-validation for recommender systems.
For now, if you want multiple experiments to achieve a kind of cross-validation, we propose you do the following:
from recpack.datasets import DummyDataset
from recpack.scenarios import StrongGeneralization
import recpack.pipelines
d = DummyDataset()
im = d.load()
pipeline_builder = recpack.pipelines.PipelineBuilder('exp1')
pipeline_builder.add_algorithm('ItemKNN', params={'K': 10})
pipeline_builder.add_metric('NDCGK', 10)
# By setting the seed explicitly, you can make sure every split is different
seeds = [1234, 5678, 9123, 4567]
pipeline_results = []
for s in seeds:
scenario = StrongGeneralization(frac_users_train=0.7, frac_interactions_in=0.8, validation=False, seed=s)
scenario.split(im)
# Continue with pipeline
pipeline_builder.set_data_from_scenario(scenario)
pipeline = pipeline_builder.build()
pipeline.run()
pipeline_results.append(pipeline.get_metrics())
Hope this helps!
Lien
Thanks for the provided code! Strictly speaking, seed-based splits won't give you true cross-validation.
Cross-validation has been a standard way of evaluating recommender systems. Even much older libraries like MyMediaLite implement it.
Just for reference, here's an extract from the "Recommender Systems Handbook" by Ricci et al.:
Sampling can lead to an over-specialization to the particular division of the training and testing data sets. For this reason, the training process may be repeated several times. The training and test sets are created from the original data set, the model is trained using the training data and tested with the examples in the test set. Next, different training/test data sets are selected to start the training/testing process again that is repeated K times. Finally, the average performance of the K learned models is reported. This process is known as cross-validation.
Here's another extract from the book 'Practical Recommender Systems' by Falk:
No matter how you split the data, there’s a risk that you’ll create a split that’s favorable for one recommender. To mitigate this, try out different samples of the data for training and find the average and the variance of each of the algorithms to understand how one is better than the other. k-fold cross-validations work by dividing the data into k folds then use k-1 folds to train the algorithm. The last fold is test data. You iterate through the full data set and allow each fold to be used as the test set. You run the evaluation k times and then calculate the average of them all.
In weak generalization, you would perform cross-validation splits 'vertically', whereas in strong generalization you would do it 'horizontally'. This is my understanding.
P. S. I already implemented a custom cross-validation procedure for my needs by creating a custom splitter and scenario, but my solution is hacky. That's why I think it would be nice to have an in-built support for CV.
Hi @paraschakis,
Thanks for the references and added information!
I've done some further thinking on the subject, and here's where I've ended up.
In classification it makes a ton of sense to do k-fold cross-validation, because it has the nice property that every sample is used k-1 times for training and 1 time for validation. This is great under the assumption that samples are independent.
However, in RecSys, samples are not independent. On the contrary, collaborative filtering algorithms actually exploit relationships between samples (either items or users) to learn useful patterns to make recommendations.
In this context, I don't really see the benefit of doing k-fold cross-validation over repeated random subsampling validation (Monte Carlo cross-validation) as I suggested above.
In either case (k-fold and Monte Carlo), you're going to be able to exploit only some of the patterns in your data.
In StrongGeneralization, if a user (user A) most predictive of another user's behavior (ser B) ended up in the same fold in k-fold cross-validation, you would never know. In Monte Carlo cross-validation you at least have a chance that in some iteration, you will use user A's behavior to make recommendations to user B.
Monte Carlo cross-validation also has the advantage that the proportion of data split into training/validation is not dependent on the number of repetitions.
In summary,
- I hope I was able to convince you that Monte Carlo cross-validation is a good idea (I actually think it still fits with what is written in the RecSys handbook)
- Currently I'm not going to prioritize adding k-fold cross-validation to RecPack. If anyone reading this feels I should, please leave a comment or an upvote, and I'll reconsider.
Lien
Hi, @LienM and @paraschakis,
I'd like to bring up a specific point about hyperparameter tuning and its interaction with cross-validation methods.
The solution provided by @LienM for Monte Carlo cross-validation is helpful, but it seems challenging to integrate it directly with the hyperparameter tuning options currently available in the library. In my opinion, introducing built-in support for Monte Carlo cross-validation, particularly one that seamlessly integrates with hyperparameter tuning methods provided in the library, could significantly enhance the value of RecPack. This integration would provide a more robust methodology for hyperparameter tuning, allowing for the variability in the model's performance due to the different splits. Therefore, I would like to request a reconsideration of the priority for implementing a built-in cross-validation feature in RecPack.
Thank you for your consideration.
P.S.: Here are two papers that propose cross-validation for hyperparameter tuning: (a) "Top-N Recommendation Algorithms: A Quest for the State-of-the-Art" by Anelli et al.; and (b) "On the discriminative power of Hyper-parameters in Cross-Validation and how to choose them" by Anelli et al.
Best,
Stavroula