Statistical uncertainties
Opened this issue · 8 comments
Context
This issue will focus on the management of statistical uncertainties.
We can use this issue to list/propose multiple methods other than the one described down here.
If you are instead interested in implementing systematic uncertainties, please refer to issue #88.
Current work
For the moment the method that we decided to start implementing corresponds to the Bootstrapping method.
In particular it is planned to re-sample DL2 data in a cycle of N boostrapping calls where each call will execute the production script in use with the resampled data.
In-memory storage of the calculated IRFs and sensitivity curves will allow to calculate in the end median and standard deviation for each energy-bin, resulting in the final output with statistical uncertainties.
Methods discussed
This list can be expanded modified.
- Bootstrap
For the resolutions computed as x/100 of the containment radius, we can compute the confidence interval analytically.
Same for effective area, the detected number of events follows a binomial distribution and we can use https://docs.astropy.org/en/stable/api/astropy.stats.binom_conf_interval.html
The problem is more to include the uncertainties created by the cut optimization into the resulting IRFs, as there is no way to trace these uncertainties, so the only solution is repeating it on different samples (bootstrapping, more simulations, cross-validation...)
Just to comment we need the error on the effective area to test the tolerance.
Please implement the error on effective area using binomial confidence intervals.
Can we revive this issue, and try to implement a solution? At LST we have point gamma MC files with 7 different pointing directions at the same wobble offset and we can use them to test the solution as well.
We are also looking at generating MC/IRF grids for interpolation (and extrapolation), so it would be very helpful to understand the statistical uncertainties involved with the IRFs
I think the best course of action for the moment would be to do this via bootstrapping.
From the point of view of pyirf I think the only thing to add is the support for the error quantities in the DL3 files as the actual operation (at least in this specific case) is not something the library would do, but rather the (ctapipe) tool or the (protopipe/lstchain) script would do: sampling the DL2 data, call the DL2-to-DL3 step on the sample and read a (N_quantities, N_samples) array on which to calculate means and STDs to write in a final DL3 file.
There was a Ph.D. student testing this for protopipe (in fact I have the same issue there), perhaps I can ask here if she has a template to share.
That seems overcomplicated to me. For effective areas at least simple binomial uncertainties should do. Also I wonder if we need more than that, because the goal is simply to check if a given MC library, with a certain set of cuts, provides sufficiently "precise" (stats-wise) IRFs. If that is not the case we should simply produce more MC, until those uncertainties are negligible (and fulfill the CTA requirements). In that sense, I don't see an urgent need either in having the error quantities available in the standard DL3 file.
I guess right now the uncertainties are only meant for testing if we are within requirements or need more stats. But, they could also be used by the science tools as weights in the likelihood fit, or to determine what region of phase space is "usable" for analysis. In any case, it doesn't hurt to store them in the IRF, even if they are not used for anything other than validation.