concerning contour regions in a two-parameter case for a single observable
girishky opened this issue · 9 comments
Hello colleagues,
I have a query not so much directly related to flavio code, but more towards understanding statistical methods used for doing chi-square analysis.
I understand that in a global fit analysis where several measurements (say, N no. of observables) are combined together, one uses
I did a quick review of flavio code regarding this, and I think, flavio always uses values 2.3 and 6 irrespective of numbers of measurements involves. For example, in the flavio plots here, the function flavio.plots.likelihood_contour is used. Cheking this function’s definition, I see that dof is set to 2, and hence
Sorry if its a very elementary stuff that I should know, but I got confused because now in this particular case one is trying to fit two parameters to one measurement (a single data point), and generally chi-square language is used when we try to combine several measurements. Normally, I would take the central value of measurement and add and subtract 1 sigma errors to get an exp. range, and plot the values of two parameters for which theory value lies in this exp. range. But this essentially corresponds to plotting for
Thanks in advance!
Girish
Hi,
yes, you are correct: if you are fitting two parameters to one measurement, the number of DOF is 1 and you should not use 2.3 and 6 but 1 and 4. And it's true that this is not the default of what likelihood_contour
does. This function is meant to be used for cases where the number of DOF is 2.
In one of my papers (https://arxiv.org/pdf/1801.01112.pdf) we added a footnote about this issue:
What you can do as a workaround is to use likelihood_contour_data
, but not use the levels
key when you feed the result to contour
:
import flavio.plots as fpl
data = fpl.likelihood_contour_data(...)
data.pop("levels")
fpl.contour(**data, levels=[1, 4])
One might wonder whether it would actually be better having dof
being a keyword argument in likelihood_contour_data
. @peterstangl what do you think?
I actually think that it might be better if likelihood_contour_data
would not return the levels
at all. In principle, the levels can always be changed without recomputing the contour data and this is not very clear at the moment. In addition to changing the DOF, one might also want to change n_sigma
after one has computed the plot data. The fact that likelihood_contour_data
has the n_sigma
argument suggests that one has to recompute all the data in this case. But this is actually not necessary.
I think it would be nicer if n_sigma
and dof
would be arguments of the contour
function. But of course it should also be possible to explicitly provide the levels
instead of just n_sigma
since contour
is a low-level plotting function that can also be used for contours of things other than likelihoods. So I'm not sure how to implement these two different ways of providing the contour levels (either levels
or n_sigma
and dof
). One way would be to just have levels
and then the user always has to use levels=[delta_chi2(n, dof) for n in n_sigma]
.
I agree about the first paragraph, but using n_sigma
and dof
in contour
is problematic as this is just a plotting function which e.g. doesn't care if one plots the likelihood, log-likelihood, or chi2...
I agree that n_sigma
as an argument of contours
is problematic. One idea would be to have some new functions like likelihood_levels
, chi2_levels
, etc, that take n_sigma
and dof
as arguments and return the levels
. Or only one function that also takes a contour_type
argument with possible values likelihood
, chi2
, log-likelihood
etc.
But it's also a bit confusing that likelihood_contour_data
actually returns neither likelihood nor log-likelihood but chi2 values...
Another idea for contours
would be to use the levels
argument in two different ways:
- if it is a list, tuple, or array, it is treated as ever
- if it is a dictionary, then the keys
type
andn_sigma
are mandatory and the keydof
is optional. The value oftype
could then be e.g.likelihood
,chi2
,log-likelihood
.
By this, we could avoid additional functions or the need to use levels=[delta_chi2(n, dof) for n in n_sigma]
. Calling contours would just look like e.g.
fpl.contours(**data, {'type':'chi2', 'n_sigma':[1,2,3]})
or
fpl.contours(**data, levels={'type':'chi2', 'n_sigma':[1,2,3], 'dof'=1})
Hi all,
Thanks for prompt responses. But I am bit puzzled by the use of word ‘dof’. Let me write down what I understand so far about flavio approach:
- for N (>1) measurement, but fitting to only one parameter, say one combination of WCs, chi2 = 1 for 68% credible region.
- for N (>1) measurements, but fitting to two parameters, chi2 = 2.3 for same region.
- for a single measurement, but fitting to any number of parameter, chi2 = 1 for the same region.
Am I correct?
Coming back to dof, flavio implementaion is such that whenever you are plotting contours in a two-dimensional plane for the second case, which, I suppose, is used the most in papers, dof=2 always (therefore, chi2 = 2.3). I know that the correct estimate of dof is sometime more nuanced than one might think, but taking simple definition [dof = measurements (N) - parameters (k)], I see that dof is different for two different fits with, say, N=50, and N=100. But in flavio, dof will always defaults to 2 (and so chi2 = 2.3) for both fits. Please let me know if there is some stupidity in my argument? Any help is appreciated. Thanks!
OK, I see the confusion. Indeed this is not about the total number of degrees of freedom in the fit as we are in reality not talking about the likelihood, but about likelihood ratios (or delta-chi2). In this case it's really only the number of free parameters in the plot that's relevant. You are right it's almost always 2 in 2D plots, but there are cases where it's 1, namely when the two Wilson coefficients only appear in a single, fixed combination in the observables.
@DavidMStraub Ok, i think i get it now. Thank you very much!