Consider model ranking
jphall663 opened this issue · 1 comments
It would be nice to have explicit model ranking for selection. I.e., something that answers the question of which is the "best" model between a group of trained models without human-eyeballing (of course human eye-balling is also great!). This would be in addition to Pareto-based selection, not a replacement for Pareto-based selection.
Consider Caruana et al. 2004 "b" - https://dl.acm.org/doi/10.1145/1046456.1046470.
I have a prototype here: https://jphall663.github.io/GWU_rml/, code: https://nbviewer.org/github/jphall663/GWU_rml/blob/master/assignments/eval.ipynb.
In addition to prototype, would be really cool for users to be able to:
- select the number and type of assessments, e.g., 3 assessments: AUC, max. ACC, and AIR (gets at balancing real-world selection criteria)
- for users to choose between random folds and user-selected segments (gets at weakspots and robustness)
- for users to be able to perturb folds or data segments (gets at robustness)
(The current prototype is fixed at 5 folds, fixed with five quality assessment stats (no AIR, etc.), and does not perturb folds.)
Let me know if you'd like to discuss.
That makes sense. We may provide an enhanced leaderboard panel that adopts the model ranking and segmented metrics in future releases.