/analysis-examples

Primary LanguagePythonApache License 2.0Apache-2.0

Analysis Examples

This repo contains some examples of analysis performed on the Analysis Facility using RDataFrame distributed on Dask on top of HTCondor

How to access analysis facility

Usage of RDataFrame distributed on Dask, on top of HTCondor

  • Open a a new Python3 notebook

  • Deploy a Dask cluster on HTCondor. This can be done via the Dask JupyterLab plugin:

    • click on +new button:

      dask_plugin

    • choose where to deploy the cluster:

      dask_choice

    • and wait for the scheduler to start to run. The interface will contain all info about the cluster and three buttons for setting up a client, scaling and shutdown:

      dask_deployed

  • Once deployed, initialize the Dask client: pushing <> setup automatically a cell to do this, that will look like this:

    from dask.distributed import Client
    
    client = Client("localhost:37470")
    client
    
  • Insert the declaration of your custom functions inside an initialization function:

    import ROOT
    
    text_file = open("postselection.h", "r")
    data = text_file.read()
    distributed = ROOT.RDF.Experimental.Distributed
    
    def my_initialization_function():
        ROOT.gInterpreter.Declare('{}'.format(data))
      
    distributed.initialize(my_initialization_function)
    
  • Create a distributed RDataFrame reading a list of samples:

    chain = [<path to 1st .root file>, <path to 2nd .root file>, ...]
    
    df = ROOT.RDF.Experimental.Distributed.Dask.RDataFrame("<name of tree>",
                  chain,
                  npartitions=<number of partitions>,
                  daskclient=client)
    

Minimal example

Here you can find a simple notebook where a very simple distributed RDataFrame analysis is run on a small OpenData sample using a Dask deployment on HTCondor.