NanoAOD hadd/skims

NanoAOD has unfortunate property of not recording branches that were not filled in production. So files from the same production don't necessarily have the same branches and cannot be hadded in a straightforward manner.

Generate per file branch list hashes to match files.

python get_info.py -i dummy.json -o hashes/dummy.json -j 80

Hadding

Uses parsl (local or slurm) to scale out jobs. Actual logic is implemented in :

haddnano.py - standard haddnano script
modhaddnano.py - modifed haddnano script, that can drop branches
check_hadd.py - counts number of events before and after

Prepare jobs

python jobs.py -i jsons_in/dummy.json -d ~/group/storage/dummy_merge/ -o jsons_out/dummy_merge.json -s hashes/dummy.json -j 20

Will compute the input mergemap and generate jsons_out/dummy_merge.json and jsons_out/dummy_merge_log.json, containing the output json and the mergemap respectively.

Run merge jobs

python jobs.py -i jsons_in/dummy.json -d ~/group/storage/dummy_merge/ -o jsons_out/dummy_merge.json -s hashes/dummy.json -j 20 --run

Will submit haddnano.py jobs.

Run skim jobs

python jobs.py -i jsons_in/dummy.json -d ~/group/storage/dummy_skim/ -o jsons_out/dummy_skim.json -s hashes/dummy.json -j 20 --run --skim --branches branches.json

Will submit modhaddnano.py jobs, dropping the branches in branches.json.

Check outputs

python jobs.py -i jsons_in/dummy.json -d ~/group/storage/dummy_skim/ -o jsons_out/dummy_skim.json -s hashes/dummy.json -j 20 --run --check

Will submit check_hadd.py and report the number of events in/out.

Scale out

Run with --parsl -j 4. -j denotes number of nodes (all cores on node are used), so use a lower number.

Todo

Clean up haddnano.py python with argparse/main/etc...
Add compression as argument

andrzejnovak/NanoSkimmer