CMSJMECalculators - moved to cp3-cms/CMSJMECalculators
This packages provides an efficient ROOT::RDataFrame-friendly implementation of the recipes for jet and MET variations for the CMS experiment, for use with samples in the NanoAOD format. The code was adopted from the bamboo analysis framework.
NOTE: This is a preview to gather feedback (please open an issue with yours), without any guarantees of stability (including in naming) for now
Update: This package was moved to cp3-cms/CMSJMECalculators (on the CERN gitlab instance), development continues there
For using these helpers from python, the recommended solution is to install the package (in a virtual or conda environment) with
pip install git+https://github.com/pieterdavid/CMSJMECalculators.git
scikit-build is used to compile the C++ components against the available ROOT distribution.
Inside a CMSSW environment, the install_cmssw.sh
script
can be used:
wget -q https://raw.githubusercontent.com/pieterdavid/CMSJMECalculators/main/install_cmssw.sh
source ./install_cmssw.sh
if a specific version is needed, the $VERSION
variable can be set, e.g.
VERSION=0.1.0 source ./install_cmssw.sh
From C++ the package can be installed directly with CMake, using the standard commands (after cloning the repository):
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=<your-prefix> [other-options] <source-clone>
make
make install
This will also install the python modules in
<your-prefix>/lib/pythonX.Y/site-packages/CMSJMECalculators/
.
When installed as a python package or directly with CMake, the necessary components can be loaded with:
from CMSJMECalculators import loadJMESystematicsCalculators
loadJMESystematicsCalculators()
Note that this will load the shared library and headers or dictionary in cling, the ROOT interpreter, so they can from then on also be used in JITted code, e.g. from RDataFrame.
The variations are calculated by the C++ classes JetVariationsCalculator
and
FatJetVariationsCalculator
for the AK4 and AK8 jet JER and JES variations, and
Type1METVariationsCalculator
and FixEE2017Type1METVariationsCalculator
for the Type-1 MET variations, using the standard procedure or with the special
recipe for 2017 (Type-1 smeared or standard MET is a configuration option).
To use these, an instance should be created (with the C++ interpreter, to make it
available from JITted code), and additional configuration passed by calling
setter methods, e.g. in PyROOT:
import ROOT as gbl
calc = gbl.JetVariationsCalculator()
calc = getattr(gbl, "myJetVarCalc")
calc = gbl.JetVariationsCalculator()
# redo JEC, push_back corrector parameters for different levels
jecParams = getattr(gbl, "std::vector<JetCorrectorParameters>")()
jecParams.push_back(gbl.JetCorrectorParameters(textfilepath))
calc.setJEC(jecParams)
# calculate JES uncertainties (repeat for all sources)
jcp_unc = gbl.JetCorrectorParameters(textfilepath_UncertaintySources)
calc.addJESUncertainty("Total", jcp_unc)
# Smear jets, with JER uncertainty
calc.setSmearing(textfilepath_PtResolution, textfilepath_SF,
splitJER, # decorrelate for different regions
True, 0.2, 3.) # use hybrid recipe, matching parameters
The varied jet pt's and masses can be obtained by calling the produce
method
with the per-event quantities, converted to
ROOT::VecOps::RVec
:
from CMSJMECalculators.utils import toRVecFloat, toRVecInt
jetVars = calc.produce(toRVecFloat(tree.Jet_pt), toRVecFloat(tree.Jet_eta), ...)
since the full list of arguments can be long, and depends on a few parameters (for data the MC branches are not there, and not needed, and MET needs a few additional inputs), a helper function is provided, which can be used as follows:
from CMSJMECalculators.utils import getJetMETArgs
jetVars = calc.produce(*getJetMETargs(tree, isMC=True, forMET=False))
This will return an object that contains all the variations, e.g.
jetVars.pt(0)
will return the RVec
with new nominal jet PTs.
The corresponding names of the variations, which depend on the configuration,
can be retrieved from the calculator by calling its available()
method.
When constructing the RDataFrame graph from python, the calculator needs to be constructed directly from the cling interpreter, such that it is available in the global C++ namespace for JITted code:
gbl.gROOT.ProcessLine("JetVariationsCalculator myJetVarCalc{};")
calc = getattr(gbl, "myJetVarCalc")
the second line retrieves a reference from PyROOT, such that the configuration methods can be called as above.
Inside the RDataFrame graph the varied jet pt's and masses can be defined as a new column:
df.Define("ak4JetVars", "myJetVarcalc.produce(Jet_pt, Jet_eta, Jet_phi, ...)")
(the full set of arguments is not reproduced here, but can be found from the
utils.getJetMETargs
method; since RDataFrame uses RVec
internally
no conversion is needed).
The PyROOT example above relies on the automatically generated bindings, so
the C++ equivalent is almost identical, and straigthforward to obtain.
When calling the produce
method outside RDataFrame, most of the arguments
may need to be converted to RVec
, which fortunately supports all common
kinds of array interfaces.
TODO expand C++ examples
Since the JEC and JER parameter text files need to be downloaded from the corresponding repositories, which are quite big, a helper is provided that downloads only the files that are used, and caches them locally. It can be used like this (see the tests for more examples):
from CMSJMECalculators.jetdatabasecache import JetDatabaseCache
jecDBCache = JetDatabaseCache("JECDatabase", repository="cms-jet/JECDatabase")
jrDBCache = JetDatabaseCache("JRDatabase", repository="cms-jet/JRDatabase")
# usage example, returns the local path
pl = jecDBCache.getPayload("Summer16_07Aug2017_V11_MC", "L1FastJet", "AK4PFchs")
The cache can also be checked and updated with the checkCMSJMEDatabaseCaches
script, which has an interactive mode (-i
flag) that will start an IPython
shell after constructing the two database cache helpers.
A set of pytest-based tests are included, to make sure the implementation stays consistent with the POG-provided python version in nanoAOD-tools. The tests compare the contents of the pt and mass branches for all variations. They can be run with
pytest tests
or, inside a CMSSW environment where python2 is the default
python3 -m pytest tests
TODO make tests python2-compatible, expand, scripts for larger tests samples?