Why is the hyper-parameter tuning performed based on the score of train set instead of valid set?
geliAI opened this issue · 1 comments
Copy-pasting my email reply for reference :)
In general, you use topic models to discover the topics of a given document collection. That means you don't usually have a train/test/val split setting, but just a "train" set. Indeed, even if you want to predict the topics of a new unseen document, the predicted topics remain fixed. For this reason, we decided to optimize the score on the train set.
However, as you may have noticed, OCTIS can consider the datasets also as train/test or train/val/test splits. The validation split is used only for those models that use an early stopping criterion to stop the training process. While the test split is used only in the case classification metrics are considered.
I'm closing this issue.