Azure/cyclecloud-scalelib

Incorrect VM count reported for HC/NC machines

Opened this issue · 0 comments

In CycleCloud 8.5 release, which should ship with updated scalelib, I'm getting incorrect available VM count in azpbs buckets output (or azslurm buckets).

Seems to not affect F2 series machines, however affects HC44 and NCv4 as can be seen here:
I have 440 HC cores available which correspond to 10 HC44rs VMs. azpbs buckets shows there are only two available:
Screenshot 2024-01-10 132506

I have 480 NCv4 cores available which correspond to 5 NC96ads_A100_v4 VMs. azpbs buckets shows there is only one available:
Screenshot 2024-01-10 134119

Reverting to a template with older scalelib fixes the issue.
This issue is reproducible with both PBS and Slurm, and affects job scheduling as cluster can no longer scale beyond 2 nodes on these VM SKUs.
Is this a known issue?