git clone --recursive https://github.com/cms-tau-pog/NanoProd.git
Following command activates the framework environment:
source env.sh
Production should be run on the server that have the crab stageout area mounted to the file system.
-
Load environment on CentOS8 machine
source $PWD/env.sh voms-proxy-init -voms cms -rfc -valid 192:00
-
Check that all datasets are present and valid (replace path to
yaml
s accordingly):cat NanoProd/crab/ERA/*.yaml | grep -v -E '^( +| *#)' | grep -E ' /' | sed -E 's/.*: (.*)/\1/' | xargs python RunKit/checkDatasetExistance.py
If all ok, there should be no output.
-
Modify output and other site-specific settings in
NanoProd/crab/overseer_cfg.yaml
. In particular:- site
- crabOutput
- localCrabOutput
- finalOutput
- renewKerberosTicket
-
Test that the code works locally (take one of the miniAOD files as an input). E.g.
mkdir -p tmp && cd tmp cmsEnv python3 $ANALYSIS_PATH/RunKit/nanoProdWrapper.py customise=NanoProd/NanoProd/customize.customize skimCfg=$ANALYSIS_PATH/NanoProd/data/skim.yaml maxEvents=100 sampleType=mc storeFailed=True era=Run2_2018 inputFiles=file:/eos/cms/store/group/phys_tau/kandroso/miniAOD_UL18/TTToSemiLeptonic.root writePSet=True skimSetup=skim skimSetupFailed=skim_failed createTar=False cmsEnv $ANALYSIS_PATH/RunKit/crabJob.sh
Check that output file
nano_0.root
is created correctly. After that, you can removetmp
directory:cd $ANALYSIS_PATH rm -r tmp
-
Test a dryrun crab submission
python RunKit/crabOverseer.py --work-area crab_test --cfg NanoProd/crab/overseer_cfg.yaml --no-loop NanoProd/crab/crab_test.yaml
- If successful, the last line output to the terminal should be
Tasks: 1 Total, 1 Submitted
- NB. Crab estimates of processing time will not be accurate, ignore them.
- After the test, remove
crab_test
directory:rm -r crab_test
- If successful, the last line output to the terminal should be
-
Test that post-processing task is known to law:
law index law run ProdTask --help
-
Submit tasks using
RunKit/crabOverseer.py
and monitor the process. It is recommended to runcrabOverseer
inscreen
.python RunKit/crabOverseer.py --cfg NanoProd/crab/overseer_cfg.yaml NanoProd/crab/Run2_2018/FILE1.yaml NanoProd/crab/Run2_2018/FILE2.yaml ...
- Use
NanoProd/crab/Run2_2018/*.yaml
to submit all the tasks - For more information about available command line arguments run
python3 RunKit/crabOverseer.py --help
- For consecutive runs, if there are no modifications in the configs, it is enough to run
crabOverseer
without any arguments:python RunKit/crabOverseer.py
- Use
The job handler will automatically create recovery tasks for jobs that failed multiple times. As of recovery #1 the jobs created will run on a single miniAOD file each, while for the latest available iteration (default is recovery #2) the job will only run on specified sites which are whitelisted in the crab overseer config: NanoProd/crab/overseer_cfg.yaml. Note: if the file has no available Rucio replica on any of those sites, the job is bound to fail.
General guidelines:
-
Check job output via Grafana monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view During the resubmission step a link is also printed to screen with the direct link to the ongoing CRAB task.
-
Identify exit code, see JobExitCodes
-
>50000
most likely associated to I/O issue with the site or the dataset. Increase the number of max retries and resend the job. -
Special exit code defined in the tool: 666 - it is used to label errors which are neither related to CMSSW compilation, bash or crab job handling. Check the specific job output from the CRAB Monitor tool, copy the jobID from Grafana. It includes cases where the dataset is corrupted, unaccessible on any tier or with no existing replica. To check if the file has any available replica on the sites run
dasgoclient --query 'site file=FILENAME'
In case there is no available Rucio replica the file cannot be processed, please write an issue on CMS-talk (e.g. Issue for Tau UL2018 file and remove the file from the studied inputs, see below.
-
-
After identifying the problem and taking action to solve it either with CMSSW, requesting Rucio transfer or adding a specific storage center to the whitelist execute the following steps.
- Edit the
yaml
file corresponding to the dataset (e.g. NanoProd/crab/Run2_2018/DY.yaml:- Increase the maximum number of retries by adding the entry
maxRecoveryCount
toconfig
in theyaml
file:config: maxRecoveryCount: 3 params: sampleType: mc era: Run2_2018 storeFailed: True
- If the job fails due to a file which is corrupted or unavailable it needs to be skipped in the nanoAOD production, this can be done by editing the
yaml
file as follows:->DYJetsToLL_M-50-madgraphMLM_ext1: /DYJetsToLL_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1_ext1-v1/MINIAODSIM
DYJetsToLL_M-50-madgraphMLM_ext1: inputDataset: /DYJetsToLL_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/RunIISummer20UL18MiniAODv2-106X_upgrade2018_realistic_v16_L1v1_ext1-v1/MINIAODSIM ignoreFiles: - /store/mc/RunIISummer20UL18MiniAODv2/DYJetsToLL_M-50_TuneCP5_13TeV-madgraphMLM-pythia8/MINIAODSIM/106X_upgrade2018_realistic_v16_L1v1_ext1-v1/40000/1D821371-03FD-B148-9E83-119185898E4F.root
- Increase the maximum number of retries by adding the entry
- Change the status.json file so the job is marked as
WaitingForRecovery
instead ofFailed
wherepython RunKit/crabOverseer.py --action 'run_cmd task.taskStatus.status = Status.WaitingForRecovery' --select 'task.name == TASK_NAME'
TASK_NAME
is the dataset nickname provided in theyaml
file, e.g.DYJetsToLL_M-50-madgraphMLM_ext1
- Run crabOverseer.py as in step 7 adding
--update-cfg
option.
- Edit the
In order to produce NanoAOD along with output branches from ParticleNET (which will be found in the "Jet" and "FatJet" collections for AK4 and AK8 outputs, respectively), the following additions are made automatically when setting up the environment:
- Follow the installation recipe in the Readme here
- Then copy the PNET models into place where they're needed, according to the first section in the ReadMe here and here.
- Then it will only work locally however. To make it work for jobs submitted to crab, put the same files in
RecoBTag/Combined/data/
andRecoTauTag/data/
inside the CMSSW_BASE/src/ directory.