NanoProd

How to install

git clone --recursive https://github.com/cms-tau-pog/NanoProd.git

Loading environment

Following command activates the framework environment:

source env.sh

How to run miniAOD->nanoAOD skims production

Production should be run on the server that have the crab stageout area mounted to the file system.

Load environment on CentOS8 machine

source $PWD/env.sh
voms-proxy-init -voms cms -rfc -valid 192:00

Check that all datasets are present and valid (replace path to yamls accordingly):

cat NanoProd/crab/ERA/*.yaml | grep -v -E '^( +| *#)' | grep -E ' /' | sed -E 's/.*: (.*)/\1/' | xargs python RunKit/checkDatasetExistance.py

If all ok, there should be no output.

Modify output and other site-specific settings in NanoProd/crab/overseer_cfg.yaml. In particular:
- site
- crabOutput
- localCrabOutput
- finalOutput
- renewKerberosTicket

Test that the code works locally (take one of the miniAOD files as an input). E.g.

mkdir -p tmp && cd tmp
cmsEnv python3 $ANALYSIS_PATH/RunKit/nanoProdWrapper.py customise=NanoProd/NanoProd/customize.customize skimCfg=$ANALYSIS_PATH/NanoProd/data/skim.yaml maxEvents=100 sampleType=mc storeFailed=True era=Run2_2018 inputFiles=file:/eos/cms/store/group/phys_tau/kandroso/miniAOD_UL18/TTToSemiLeptonic.root writePSet=True skimSetup=skim skimSetupFailed=skim_failed createTar=False
cmsEnv $ANALYSIS_PATH/RunKit/crabJob.sh

Check that output file nano_0.root is created correctly. After that, you can remove tmp directory:

cd $ANALYSIS_PATH
rm -r tmp

Test a dryrun crab submission
```
python RunKit/crabOverseer.py --work-area crab_test --cfg NanoProd/crab/overseer_cfg.yaml --no-loop NanoProd/crab/crab_test.yaml
```
- If successful, the last line output to the terminal should be
```
Tasks: 1 Total, 1 Submitted
```
- NB. Crab estimates of processing time will not be accurate, ignore them.
- After the test, remove crab_test directory:
```
rm -r crab_test
```
Test that post-processing task is known to law:
```
law index
law run ProdTask --help
```
Submit tasks using RunKit/crabOverseer.py and monitor the process. It is recommended to run crabOverseer in screen.
```
python RunKit/crabOverseer.py --cfg NanoProd/crab/overseer_cfg.yaml NanoProd/crab/Run2_2018/FILE1.yaml NanoProd/crab/Run2_2018/FILE2.yaml ...
```
- Use NanoProd/crab/Run2_2018/*.yaml to submit all the tasks
- For more information about available command line arguments run python3 RunKit/crabOverseer.py --help
- For consecutive runs, if there are no modifications in the configs, it is enough to run crabOverseer without any arguments:
```
python RunKit/crabOverseer.py
```

Resubmission of failed tasks

The job handler will automatically create recovery tasks for jobs that failed multiple times. As of recovery #1 the jobs created will run on a single miniAOD file each, while for the latest available iteration (default is recovery #2) the job will only run on specified sites which are whitelisted in the crab overseer config: NanoProd/crab/overseer_cfg.yaml. Note: if the file has no available Rucio replica on any of those sites, the job is bound to fail.

Handling failed jobs after last recovery task

General guidelines:

Check job output via Grafana monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view During the resubmission step a link is also printed to screen with the direct link to the ongoing CRAB task.
Identify exit code, see JobExitCodes
1. >50000 most likely associated to I/O issue with the site or the dataset. Increase the number of max retries and resend the job.
2. Special exit code defined in the tool: 666 - it is used to label errors which are neither related to CMSSW compilation, bash or crab job handling. Check the specific job output from the CRAB Monitor tool, copy the jobID from Grafana. It includes cases where the dataset is corrupted, unaccessible on any tier or with no existing replica. To check if the file has any available replica on the sites run
```
dasgoclient --query 'site file=FILENAME'
```
In case there is no available Rucio replica the file cannot be processed, please write an issue on CMS-talk (e.g. Issue for Tau UL2018 file and remove the file from the studied inputs, see below.

After identifying the problem and taking action to solve it either with CMSSW, requesting Rucio transfer or adding a specific storage center to the whitelist execute the following steps.

Edit the yaml file corresponding to the dataset (e.g. NanoProd/crab/Run2_2018/DY.yaml:

Increase the maximum number of retries by adding the entry maxRecoveryCount to config in the yaml file:

config:
	maxRecoveryCount: 3
   	params:
   		sampleType: mc
   		era: Run2_2018
   		storeFailed: True

If the job fails due to a file which is corrupted or unavailable it needs to be skipped in the nanoAOD production, this can be done by editing the yaml file as follows:

DYJetsToLL_M-50-madgraphMLM_ext1: /DYJetsToLL_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1_ext1-v1/MINIAODSIM

DYJetsToLL_M-50-madgraphMLM_ext1:
	inputDataset: /DYJetsToLL_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1_ext1-v1/MINIAODSIM
   	ignoreFiles:
   	- /store/mc/RunIISummer20UL18MiniAODv2/DYJetsToLL_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/106X_upgrade2018_realistic_v16_L1v1_ext1-v1/40000/1D821371-03FD-B148-9E83-119185898E4F.root

Change the status.json file so the job is marked as WaitingForRecovery instead of Failed
```
python RunKit/crabOverseer.py --action 'run_cmd task.taskStatus.status = Status.WaitingForRecovery' --select 'task.name == TASK_NAME'
```
where TASK_NAME is the dataset nickname provided in the yaml file, e.g. DYJetsToLL_M-50-madgraphMLM_ext1
Run crabOverseer.py as in step 7 adding --update-cfg option.

Running with ParticleNET

In order to produce NanoAOD along with output branches from ParticleNET (which will be found in the "Jet" and "FatJet" collections for AK4 and AK8 outputs, respectively), the following additions are made automatically when setting up the environment:

Follow the installation recipe in the Readme here
Then copy the PNET models into place where they're needed, according to the first section in the ReadMe here and here.
Then it will only work locally however. To make it work for jobs submitted to crab, put the same files in RecoBTag/Combined/data/ and RecoTauTag/data/ inside the CMSSW_BASE/src/ directory.