flux-framework/flux-coral2

Rabbit DirectiveBreakdown AllocateSingleServer not working

matthew-richerson opened this issue · 0 comments

Trying to create a standalone MGS through flux doesn't work. The "servers" resource is filled in with an empty "storage" field.

A standalone MGS can be created with "standaloneMgtPoolName" set to the pool name in the NnfStorageProfile. When a "#dw create_persistent type=lustre ..." workflow is created, a DirectiveBreakdown is made with a single allocation set asking for space for the MGT.

NnfStorageProfile:

...
data:
  default: false
  lustreStorage:
    capacityMgt: 5GiB
    combinedMgtMdt: false
    exclusiveMdt: false
    standaloneMgtPoolName: test-pool
...

DirectiveBreakdown:

...
status:
  ready: true
  storage:
    allocationSets:
    - allocationStrategy: AllocateSingleServer
      constraints:
        colocation:
        - key: lustre-mgt
          type: exclusive
        count: 1
        labels:
        - dataworkflowservices.github.io/storage=Rabbit
      label: mgt
      minimumCapacity: 5368709120

Flux fills in the "Servers" resource as follows:

    "spec": {
        "allocationSets": [
            {
                "allocationSize": 5368709120,
                "label": "mgt",
                "storage": []
            }
        ]
    },

From looking at the DWS code, it looks like the AllocateSingleServer case is not handled here: https://github.com/flux-framework/flux-coral2/blob/master/src/python/flux_k8s/directivebreakdown.py#L34 resulting in the empty "Storage" field.

I haven't tested it, but this would probably also cause a problem for a Lustre file system that allocated an MGT as part of the job (i.e., ExternalMgs not set).