Failure modes
vdice opened this issue · 2 comments
vdice commented
Compiling failure modes as sporadically seen during instance startup.
Race? Hashistack starts up but claims no nodes available
Snippet from var/log/cloud-init-output.log
on host:
Running servers using DNS zone '44.206.220.69.sslip.io'
Starting consul...
Starting vault...
Waiting for vault...
Storing unseal token in ./data/vault/unseal
Starting nomad...
Waiting for nomad...
Starting traefik job...
==> 2022-06-10T16:34:40Z: Monitoring evaluation "97f8748f"
2022-06-10T16:34:40Z: Evaluation triggered by job "traefik"
==> 2022-06-10T16:34:41Z: Monitoring evaluation "97f8748f"
2022-06-10T16:34:41Z: Evaluation within deployment: "1f4993ac"
2022-06-10T16:34:41Z: Evaluation status changed: "pending" -> "complete"
==> 2022-06-10T16:34:41Z: Evaluation "97f8748f" finished with status "complete" but failed to place all allocations:
2022-06-10T16:34:41Z: Task Group "traefik" (failed to place 1 allocation):
* No nodes were eligible for evaluation
* No nodes are available in datacenter "dc1"
2022-06-10T16:34:41Z: Evaluation "3fa33e66" waiting for additional capacity to place remainder
==> 2022-06-10T16:34:41Z: Monitoring deployment "1f4993ac"
...
(Traefik doesn't schedule; script fails)
vdice commented
One thought from @adamreese was we might want to add a 'wait for Consul' check (a la the Nomad wait), just in case this is due to Consul not being ready in time. If/when this occurs, let's be sure to check the Consul log for any hints (log/consul.log)