What is the question?
Closed this issue · 2 comments
What do we want a metric to quantify?
(This ties into the weighting of what we think is important for the metrics in combining results from many classes.)
Please chime in with dissent, but I think the state of the discussion at the moment is that the chosen metric should quantify the quality of probabilistic classifications of "full" lightcurves over some set period of time for two reasons:
-
The early classification challenge may not be as valuable to the science because it's not realistic to do it without information beyond the lightcurve (@gnarayan) and we won't be able to make/validate that information nor design/test an appropriately sophisticated metric on the timescale for releasing the challenge.
-
A metric for the anomaly detection challenge would by definition be hard to algorithmically define such that participants can run it on the test set and that meaningful covariances (regarding hierarchical classes) are accounted for, but it might not be so hard to pick out "by eye" in the final results, meaning it could still be included in the paper without being the primary Kaggle metric.
For these reasons, @rbiswas4 are in favor of continuing the metric decisionmaking process starting with the "vanilla" version of the competition (with 1- or 10-year lightcurves, for example).
After a chat with @reneehlozek about the PLAsTiCC timeline, the question is settled: we'll tackle the full lightcurve challenge first, and use the lessons learned from that simpler question when preparing for future challenges aimed at early classification and anomaly detection.