Azure/az-hop

Dynamic partitions

Opened this issue · 4 comments

In what area(s)?

/area administration
/area ansible
/area autoscaling
/area configuration
/area cyclecloud
/area documentation
/area image
/area job-scheduling
/area monitoring
/area ood
/area remote-visualization
/area user-management

Describe the feature

Do we expose the dynamic partitions that CC adds in 8.4? I think it would be useful if we could allocate smaller nodes if the job is smaller. E.g. running a 4 cpu job on HB120 vs HB16.

I'm not sure about the exact scenario. It adds lots of complexity, and I'm not sure of the value provided

I think what Matt is saying here is:

For those VM series where Azure provides breakdowns into different sizes (e.g. NC24ads A100 v4, NC48ads A100 v4, NC96ads A100 v4), bundle those in one partition and then, based on the number of cpus/gpus requested, have slurm request the smallest one that fulfils the requirements of the job.

It does not really apply to the HB series, since the smaller versions here are just restricted CPUs with the same price, but it would e.g. also apply to the F series.

Ah I forgot about the HB series carrying the same price across all sizes. Yes, for the scenarios where you only want part of the node I think this might be useful. Although under heavy load I think this cost savings effect will disappear/get small. It can still provide better isolation between jobs though (one bad job can't fill up /tmp anymore etc)