iris-hep/analysis-grand-challenge

IRIS-HEP / AGC Demo day 24.02.2023

oshadura opened this issue · 6 comments

Preliminary draft:

  • Coffea-casa with working cephfs
  • Distributed xcache with redirector at coffea-casa
  • ServiceX working with CMS IAM token
  • distributed RDF on coffea-casa
  • scaling ServiceX

agenda: https://indico.cern.ch/event/1232470/

Further ideas from brainstorming (trying to keep items in priority order):

Coffea-Casa:

  • Generate token with appropriate write scopes from IAM.
  • Coffea-Casa working with CephFS integrated (implies UID integration, maybe even integrated with the Condor clients).
  • Read/write from EOSCMS
  • Dependency manager (demo delayed from December)
  • Demonstrate deployment of Coffea-Casa on minikube.

OpenData Facility:

  • CERNBox integration (may not be possible due to very old version of XRootD at CERNBox side -- progress will be limited by the CERN side).

ServiceX:

  • ServiceX client uses IAM token to authentication; no separate authorization step from inside Coffea-Casa. Config file is auto-generated.
  • Dashboard available from within JupyterHub.
  • More robust uproot transformer (writing out multiple columns, etc).

We should have ServiceX with multiple code generator backends

Multi-user simultaneous scale tests at various facilities

Data management with Iceberg: "TTree friend" functionality

  • More robust uproot transformer (writing out multiple columns, etc).

I'm a bit confused here; could you explain what this means, @bbockelm? Writing multiple columns is already standard for the Uproot transformer.

Closing this as the demo day is done. Recordings are attached to the agenda and can be found at https://youtu.be/zb1va9YDqY4. The next demo day will be integrated into the AGC workshop https://indico.cern.ch/e/agc-workshop-2023.

Not sure about the "multiple columns" point, perhaps it refers to the handling of multiple trees in the input dataset instead?