Apple Silicon Nomad client issue
RupertBothma opened this issue · 3 comments
Hi,
Seems that on latest versions of Docker v4.26.1 , vagrant 2.4.0 and the nomad version that hashiqube installed v.1.7.2 ( current latest)
No matter where the cpu settings are at in the docker for desktop advance resource allocation the nomad client does not pickup the cpu and will not allocate any cpu resources to be used by jobs.
So no nomad jobs can be allocated.
I believe you also have an apple silicon mac and could possible recreate the issue as well.
Must be issue with new nomad 1.7+ when specifying >1.6.1 the cpu issue goes away. Must be some breaking changes with nomad > 1.7.+ hashicorp/nomad#18843 might be causing issues in vagrant.
Hi there @RupertBothma Thank you for opening this issue, and happy new year to you!
I can confirm this behaviour.
hashiqube0: ==> 2024-01-05T21:38:37Z: Monitoring evaluation "879a1d81"
hashiqube0: 2024-01-05T21:38:37Z: Evaluation triggered by job "traefik"
hashiqube0: 2024-01-05T21:38:37Z: Evaluation within deployment: "0c77596f"
hashiqube0: 2024-01-05T21:38:37Z: Evaluation status changed: "pending" -> "complete"
hashiqube0: ==> 2024-01-05T21:38:37Z: Evaluation "879a1d81" finished with status "complete" but failed to place all allocations:
hashiqube0: 2024-01-05T21:38:37Z: Task Group "traefik" (failed to place 1 allocation):
hashiqube0: * Resources exhausted on 1 nodes
hashiqube0: * Dimension "cpu" exhausted on 1 nodes
hashiqube0: 2024-01-05T21:38:37Z: Evaluation "06ff79af" waiting for additional capacity to place remainder
hashiqube0: ==> 2024-01-05T21:38:37Z: Monitoring deployment "0c77596f"
Nomad v1.7.2
I have reached out to my colleagues at Hashicorp about this, thank you for reporting it @RupertBothma !
I have heard back from my colleague at Hashicorp, he says:
CPU fingerprint with Docker Desktop on Apple Silicon never really worked because the CPU speed is not made available anywhere, so it’s impossible for Nomad to detect it
If you run previous versions of Nomad you will notice that the fingerprinted capacity is always 1000MHz. This is a value we used to hardcode as a fallback but we don’t anymore on 1.7.x
(https://github.com/hashicorp/nomad/blob/release/1.6.x/client/fingerprint/cpu.go#L23) because it’s just wrong.
The only option for now is to pass their own hardcoded value using client.cpu_total_compute (https://developer.hashicorp.com/nomad/docs/configuration/client#cpu_total_compute)
For now I am setting nomad client.cpu_total_compute to 8000
Tested and jobs launched