fermyon/installer

Failure modes

vdice opened this issue · 2 comments

vdice commented

Compiling failure modes as sporadically seen during instance startup.

Race? Hashistack starts up but claims no nodes available

Snippet from var/log/cloud-init-output.log on host:

Running servers using DNS zone '44.206.220.69.sslip.io'
Starting consul...
Starting vault...
Waiting for vault...
Storing unseal token in ./data/vault/unseal
Starting nomad...
Waiting for nomad...
Starting traefik job...
==> 2022-06-10T16:34:40Z: Monitoring evaluation "97f8748f"
    2022-06-10T16:34:40Z: Evaluation triggered by job "traefik"
==> 2022-06-10T16:34:41Z: Monitoring evaluation "97f8748f"
    2022-06-10T16:34:41Z: Evaluation within deployment: "1f4993ac"
    2022-06-10T16:34:41Z: Evaluation status changed: "pending" -> "complete"
==> 2022-06-10T16:34:41Z: Evaluation "97f8748f" finished with status "complete" but failed to place all allocations:
    2022-06-10T16:34:41Z: Task Group "traefik" (failed to place 1 allocation):
      * No nodes were eligible for evaluation
      * No nodes are available in datacenter "dc1"
    2022-06-10T16:34:41Z: Evaluation "3fa33e66" waiting for additional capacity to place remainder
==> 2022-06-10T16:34:41Z: Monitoring deployment "1f4993ac"
...

(Traefik doesn't schedule; script fails)

vdice commented

One thought from @adamreese was we might want to add a 'wait for Consul' check (a la the Nomad wait), just in case this is due to Consul not being ready in time. If/when this occurs, let's be sure to check the Consul log for any hints (log/consul.log)

vdice commented

Closing for now per #39

We can re-open or re-file if encountered again.