iris-hep/analysis-grand-challenge

Follow-up items to ML extension of analysis

Opened this issue · 2 comments

Collecting follow-up items to #122 here that are not crucial to be addressed immediately in that PR but can be revisited. cc @ekauffma

  • understand large change in event yields with new cuts (almost an order of magnitude less events), though the new yields are consistent with what CMS had for the 2022 open data workshop in https://cms-opendata-workshop.github.io/workshop2022-lesson-ttbarljetsanalysis/02-coffea-analysis/index.html#plotting
  • investigate possibility of merging histogram-writing code into single function that avoids hardcoding information where possible
  • harmonize object names (includes also the ML training notebook and documentation probably) from e.g. "top_hadron jet" to "b_{had top}" etc, focusing the names of the b-tagged jet on "b" instead of "top"
  • where did the particle dependency come from?
  • the model_even, model_odd determination would probably make for a good utils function to remove that from the notebook
  • turn the first look at all the ML features into a grid of plots to save some space
  • make func_adl query depending on if inference is used (do not serve extra columns if not needed)
  • the last cabinetry part also needs a if USE_INFERENCE wrapping

The particle dependency is a mistake. It is used in the plotEvents.ipynb notebook that is used in the docs. I will remove this.

I have also realized that the func_adl query method was never updated to accommodate the new cuts. It will become a bit more complicated due to this. I am working on it now