Follow-up items to ML extension of analysis
Opened this issue · 2 comments
alexander-held commented
Collecting follow-up items to #122 here that are not crucial to be addressed immediately in that PR but can be revisited. cc @ekauffma
- understand large change in event yields with new cuts (almost an order of magnitude less events), though the new yields are consistent with what CMS had for the 2022 open data workshop in https://cms-opendata-workshop.github.io/workshop2022-lesson-ttbarljetsanalysis/02-coffea-analysis/index.html#plotting
- investigate possibility of merging histogram-writing code into single function that avoids hardcoding information where possible
- harmonize object names (includes also the ML training notebook and documentation probably) from e.g. "top_hadron jet" to "b_{had top}" etc, focusing the names of the b-tagged jet on "b" instead of "top"
- where did the
particle
dependency come from? - the
model_even
,model_odd
determination would probably make for a goodutils
function to remove that from the notebook - turn the first look at all the ML features into a grid of plots to save some space
- make func_adl query depending on if inference is used (do not serve extra columns if not needed)
- the last
cabinetry
part also needs aif USE_INFERENCE
wrapping
ekauffma commented
The particle
dependency is a mistake. It is used in the plotEvents.ipynb
notebook that is used in the docs. I will remove this.
ekauffma commented
I have also realized that the func_adl
query method was never updated to accommodate the new cuts. It will become a bit more complicated due to this. I am working on it now