
CMS data analysis helpers

This is my notebook for CMS related projects and various small code snippets.


  • crab: scripts to work with CRAB, the CMS grid submission tool
    • mcgen_userscript: run CMS MC generation from step1 (GEN-SIM) to MINIAOD (step4) in one job.
  • opendata: scripts to work with CERN open data

Misc stuff

Finding the PU configuration of a MC sample

The true distribution of pileup vertices for a CMS MC sample can be extracted from the configuration fragment.

  • Enter the sample DAS name (AODSIM) to PdmV to find the production campaigns: link
  • Get the setup command for a particular campaign: link
  • Find the premixed MC sample: link
  • Get the premix setup conf: link
  • From there, find the pileup configuration: --pileup 2018_25ns_JuneProjectionFull18_PoissonOOTPU
  • The pileup confs can be found in the folder $CMSSW_RELEASE_BASE/src/SimGeneral/MixingModule/python
  • This script can be used to extract the MC pileup histogram.

PFAlgo debugging

Add this to the end of the step3.py to enable logging and debug outputs.

process.MessageLogger.categories += ["PFAlgo", "PFCandConnector", "PFBlockAlgo"]
process.MessageLogger.debugModules = cms.untracked.vstring("particleFlowTmp")
process.MessageLogger.debugs = cms.untracked.PSet(
     INFO =  cms.untracked.PSet(limit = cms.untracked.int32(0)),
     DEBUG   = cms.untracked.PSet(limit = cms.untracked.int32(0)),
     PFAlgo = cms.untracked.PSet(limit = cms.untracked.int32(-1)),
     PFCandConnector = cms.untracked.PSet(limit = cms.untracked.int32(-1)),
     PFBlockAlgo = cms.untracked.PSet(limit = cms.untracked.int32(-1)),
     threshold = cms.untracked.string('DEBUG')

#To keep low-level inputs
clusters = [
  'keep recoPFClusters_particleFlowClusterECAL_*_*', 
  'keep recoPFClusters_particleFlowClusterHCAL_*_*', 
  'keep recoPFClusters_particleFlowClusterHO_*_*', 
  'keep recoPFClusters_particleFlowClusterHF_*_*', 
  'keep recoPFClusters_particleFlowClusterPS_*_*',
  'keep recoTracks_generalTracks_*_*',
  'keep recoTrackExtras_generalTracks_*_*',
  'keep TrackingRecHitsOwned_generalTracks_*_*',
  'keep recoGenParticles_prunedGenParticles_*_*'
process.AODSIMoutput = cms.OutputModule("PoolOutputModule",
    dataset = cms.untracked.PSet(
        dataTier = cms.untracked.string('AODSIM'),
        filterName = cms.untracked.string('')
    fileName = cms.untracked.string('file:step3_AOD.root'),
    outputCommands = process.AODSIMEventContent.outputCommands + clusters,
    splitLevel = cms.untracked.int32(0)

Compiling and running a test workflow.

runTheMatrix.py -l 38

Selecting a subset of events in a file

cmsRun $CMSSW_RELEASE_BASE/src/PhysicsTools/Utilities/configuration/copyPickMerge_cfg.py inputFiles=root://cms-xrd-global.cern.ch///store/relval/CMSSW_11_0_0_pre12/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_110X_mcRun3_2021_realistic_v5-v1/20000/7CCD50E3-D786-4044-9CEF-793F6EC79183.root maxEvents=10

Dump a Global Tag

conddb copy --destdb 110X_mcRun3_2021_realistic_v5.db 110X_mcRun3_2021_realistic_v5

In CMSSW configuration

process.GlobalTag = GlobalTag(process.GlobalTag, "110X_mcRun3_2021_realistic_v5", "")
process.GlobalTag.connect = "sqlite_file:110X_mcRun3_2021_realistic_v5.db"


scram b -j8 USER_CXXFLAGS+="-DEDM_ML_DEBUG" USER_CXXFLAGS+="-O0" USER_CXXFLAGS+="-g" USER_CXXFLAGS+="-fno-omit-frame-pointer"


process.MessageLogger.cerr.threshold = "DEBUG"
process.MessageLogger.debugModules = ["*"]