The following setup is based on Jaime Frey and PIC[1]'s IT team prototype [2] which aimed to run HTCondor jobs inside worker nodes at Barcelona Supercomputing Center [3]
- This branch contains the
local_glidein
script which needs to run from an Edge node on Theta with external connectivity. - It will use the collector as CCB.
- There are two halves to this, a first part which takes care of the COBALT side of things and gets all directories ready and in place. The second part will run a Singularity container tailored specifically for the setup. Said container will run the HTCondor part of the Split/starter at the Edge node (given that installing HTCondor is not an option)
- What will be the ratio of nodes/slots? 1 node = 1 partitionable slot (as of now). This is configurable
- We need to figure out naming conventions for the SSHFS mount directories (one per slot? one per node?)
- Is this all going to run under my account? -> Totally fine by me, condor binaries do live in shared storage
- Where are we keeping scratch areas? Shared (project) storage?
[1] https://www.pic.es/areas/#lhc
[2] https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=RunCmsJobsAtBsc
[4] https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=BuildingHtcondorOnLinux