Improved func_adl query to not accidentally drop events that are accepted under systematic variations
alexander-held opened this issue · 4 comments
Query including lepton and jet selection:
def get_query(source: ObjectStream) -> ObjectStream:
return source.Where(lambda e:\
# == 1 lep
e.electron_pt.Where(lambda pT: pT > 25).Count() + e.muon_pt.Where(lambda pT: pT > 25).Count()== 1
)\
.Where(lambda e:\
# >= 4 jets
e.jet_pt.Where(lambda pT: pT > 25).Count() >= 4
)\
.Where(lambda e:\
# >= 1 jet with pT > 25 GeV and b-tag >= 0.5
{"pT": e.jet_pt, "btag": e.jet_btag}.Zip().Where(lambda jet: jet.btag >= 0.5 and jet.pT > 25).Count() >= 1
)\
.Select(lambda e:\
# return columns
{
"electron_pt": e.electron_pt,
"muon_pt": e.muon_pt,
"jet_pt": e.jet_pt,
"jet_eta": e.jet_eta,
"jet_phi": e.jet_phi,
"jet_mass": e.jet_mass,
"jet_btag": e.jet_btag,
}
)
This does not account for object systematic variations in the processor that can relax acceptance requirements. The query should be loosened accordingly, though it is unclear how to do that in a way that guarantees full acceptance in a general setup.
I am not clear what the goal is here... Is it the systematics? (the issue title has me wondering)
The current query implemented in the notebook does not do any filtering at all, so the goal of this issue would be adding a filter similar to the query above, with suitably relaxed cuts that ensure we can still evaluate systematic variations that end up relaxing the selection.
Ok - can you (@alexander-held) point me to the file where this query is so I don't modify the wrong thing?
@gordonwatts the updated query as shown above is in main
now in coffea.ipynb
(as of #85 being merged). The missing piece is relaxing the cuts sufficiently such that no (significant number of) events are filtered out that would pass the selection requirements under systematic variations. In the current implementation we have for example a 5% systematic uncertainty for jet pT that can bring events back into acceptance.
I assume that the standard pattern to design a better query would be making it very loose and then tightening it until the results start changing (because we are losing events that would pass under systematic variations).