NNPDF/nnpdf

unable to perform fit comparison

Closed this issue · 6 comments

I have been trying to compare fits of '4.0 datasets + 1 additional datasets' vs baseline (4.0 only with new data). however, I keep getting some positivity related error:

attempt 1:

----

Traceback (most recent call last):
  File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/dataplots.py", line 1083, in plot_positivity
    ax.errorbar(
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/matplotlib/__init__.py", line 1446, in inner
    return func(ax, map(sanitize_sequence, args), *kwargs)
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 3636, in errorbar
    raise ValueError(
ValueError: 'yerr' must not contain negative values

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/bin/vp-comparefits", line 8, in <module>
    sys.exit(main())
  File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/scripts/vp_comparefits.py", line 253, in main
    a.main()
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/reportengine/app.py", line 395, in main
    self.run()
  File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/app.py", line 151, in run
    super().run()
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/reportengine/app.py", line 380, in run
    rb.execute_sequential()
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 166, in execute_sequential
    result = self.get_result(callspec.function,
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 175, in get_result
    fres =  function(**kwdict)
  File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/dataplots.py", line 1127, in plot_dataspecs_positivity
    return plot_positivity(
  File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/dataplots.py", line 1095, in plot_positivity
    raise ValueError(
ValueError: The central value of dataset1 for NNPDF_POS_5GEV_XGL is outside of the error bands. This is not supported

----

attempt 2 with a differnt fit:


----

Traceback (most recent call last):
  File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/dataplots.py", line 1083, in plot_positivity
    ax.errorbar(
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/matplotlib/__init__.py", line 1446, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 3636, in errorbar
    raise ValueError(
ValueError: 'yerr' must not contain negative values

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/bin/vp-comparefits", line 8, in <module>
    sys.exit(main())
  File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/scripts/vp_comparefits.py", line 253, in main
    a.main()
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/reportengine/app.py", line 395, in main
    self.run()
  File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/app.py", line 151, in run
    super().run()
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/reportengine/app.py", line 380, in run
    rb.execute_sequential()
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 166, in execute_sequential                                                                                                                   
result = self.get_result(callspec.function,
  File "/data/theorie/tsharma/Programs/miniconda3/envs/gfits/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 175, in get_result                                                                                                                           
fres =  function(**kwdict)                                                                                                        
 File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/dataplots.py", line 1127, in plot_dataspecs_positivity      return plot_positivity(                                                                                                            
File "/data/theorie/tsharma/phys/nnpdf_code/nnpdf/validphys2/src/validphys/dataplots.py", line 1095, in plot_positivity                raise ValueError(                                                                                                                
ValueError: The central value of dataset3 for NNPDF_POS_5GEV_XGL is outside of the error bands. This is not supported
                                                                                                                                                     ----                                                                                                                                                                                                                                                                      

both the times, NNPDF_POS_5GEV_XGL seems to be the issue...?
for ref. all 3 fits are on server: 240307-ts-thcovmat-nnlo-global-40cuts-baseline, 240307-ts-thcovmat-nnlo-global-40cuts-dataset1 and 240307-ts-thcovmat-nnlo-global-40cuts-dataset3
what do I do? @scarlehoff @Radonirinaunimi @RoyStegeman

I don't have much time to look into this in-depth now, but it's related to #1868

Essentially the distribution of the NNPDF_POS_5GEV_XGL using the specific PDF is highly non-Gaussian making the median value fall outside the errorbar. What you want to do depends on you, but it's caused by the specific combinatin of PDF + obervable.

But it's very unclear to me on how to proceed. We have about 35 new experimental observables and what I am trying to do is add them individually, one at a time, to nnpdf4 runcard, to study its impact, which will be quite important moving towards 4.1. Should this particular distribution (....XGL) be discarded from the runcard if it's causing problems often?

For context, all I have done right now is taken @andreab1997 's thcovmat card and add a single dataset to it. This is of course very troublesome if we can't even add a single observable to the runcard without running into errors.

That specific dataset forces the positivity of the gluon PDF, it's a pseudo-observable defined as x*gluon. The reason it complains can be seen in this report. NNPDF_POS_5GEV_XGL has a point at $x=0.9$, which is the only problematic one for all your fits.

When plotting the positivity observable for the report, what is shown is the errorband of the 68%c.i. along with the mean over replicas (not the median as I said earlier). As you can see from the plots, in this case the mean is larger than the upper value of the 68%c.i, and thus the plotting function raises an error.

The first fit in your list does not yet add any datasets, but what changes is that it uses a new implementation of the same datasets present in Andrea's fit. It thus seems you can't even update observables in the runcard without running into errors ;). Trying to understand how this new implementation (it's uncertainties) may cause this change is probably what you'll want to understand/investigate first.

Actually the first dataset (i.e. same datasets with the new implementation) work just fine. There are differences (the causes of which we understand to certain degree) but it doesn't cause any error. I checked this. https://vp.nnpdf.science/jPd6PttqQ8ymGIfhIif0xA==/ This is a report of comparison of fits using Andrea's runcard and the only difference is that in reference, the old names (and hence the old impl) is used and in the current, the new names (and hence the new impl) is used. So the problem really is with adding new observables (even just 1) to the runcard, which worries me because I don't know how to go forward :(

There is something going on already in this fit https://vp.nnpdf.science/jPd6PttqQ8ymGIfhIif0xA==/Scales0_pdf_report_report.html#Scales0_pdf_report_PDFnormalize1_Basespecs0_PDFscalespecs1_plot_pdfs_g

The high x behaviour of all quarks and gluon seems strange.

The chi2 is also close to two, which points to something being wrong. (Probably due to the jet data looking at that report). I would start by trying a fit using the legacy version of the jets. If that is really the same as the standard fit (compare with NNPDF40_nnlo_as_0118_qcd) then you can start adding datasets.

That was a great observation, thanks! I redid this comparison but now the new implementation was limited to only TTB. https://vp.nnpdf.science/BCfqAETTQSOwQEdM8sD4CQ== the chi2 for the entire fit actually improves over default nnpdf4. I will now try to repeat adding 1 dataset at a time and see how it goes.