fermyon/installer

Spin jobs not working

itowlson opened this issue ยท 10 comments

I can now bring up Nomad and Hippo, but when I do a spin deploy, the application (the Spin job goes into Unhealthy.

The status information for a typical Spin job is:

ivan@hecate:~$ nomad status
ID                                    Type     Priority  Status   Submit Date
a0e306fa-46e1-40ac-b42f-0033e284e102  service  50        dead     2022-06-16T08:55:59+12:00
bindle                                service  50        running  2022-06-16T07:53:53+12:00
hippo                                 service  50        running  2022-06-16T07:54:23+12:00
traefik                               service  50        running  2022-06-16T07:53:27+12:00

ivan@hecate:~$ nomad status a0e306fa-46e1-40ac-b42f-0033e284e102
ID            = a0e306fa-46e1-40ac-b42f-0033e284e102
Name          = a0e306fa-46e1-40ac-b42f-0033e284e102
Submit Date   = 2022-06-16T08:55:59+12:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = pending
Periodic      = false
Parameterized = false

Summary
Task Group                            Queued  Starting  Running  Failed  Complete  Lost  Unknown
a0e306fa-46e1-40ac-b42f-0033e284e102  0       0         0        2       0         0     0

Future Rescheduling Attempts
Task Group                            Eval ID   Eval Time
a0e306fa-46e1-40ac-b42f-0033e284e102  ddfbadc1  49s from now

Latest Deployment
ID          = fedcd3c5
Status      = running
Description = Deployment is running

Deployed
Task Group                            Desired  Placed  Healthy  Unhealthy  Progress Deadline
a0e306fa-46e1-40ac-b42f-0033e284e102  1        2       0        2          2022-06-16T09:05:59+12:00

Allocations
ID        Node ID   Task Group                            Version  Desired  Status  Created    Modified
bf3b0d08  f2e113e0  a0e306fa-46e1-40ac-b42f-0033e284e102  0        run      failed  41s ago    7s ago
6c48681e  f2e113e0  a0e306fa-46e1-40ac-b42f-0033e284e102  0        stop     failed  1m58s ago  41s ago
ivan@hecate:~$ nomad status bf3b0d08
ID                     = bf3b0d08-28e0-bfd8-441d-823f3b613e74
Eval ID                = 37a1748d
Name                   = a0e306fa-46e1-40ac-b42f-0033e284e102.a0e306fa-46e1-40ac-b42f-0033e284e102[0]
Node ID                = f2e113e0
Node Name              = hecate
Job ID                 = a0e306fa-46e1-40ac-b42f-0033e284e102
Job Version            = 0
Client Status          = failed
Client Description     = Failed tasks
Desired Status         = run
Desired Description    = <none>
Created                = 59s ago
Modified               = 25s ago
Deployment ID          = fedcd3c5
Deployment Health      = unhealthy
Reschedule Eligibility = 31s from now

Allocation Addresses
Label  Dynamic  Address
*http  yes      127.0.0.1:23045

Task "spin" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
100 MHz  300 MiB  300 MiB

Task Events:
Started At     = N/A
Finished At    = 2022-06-15T20:57:45Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type             Description
2022-06-16T08:57:47+12:00  Killing          Sent interrupt. Waiting 5s before force killing
2022-06-16T08:57:45+12:00  Alloc Unhealthy  Unhealthy because of failed task
2022-06-16T08:57:45+12:00  Not Restarting   Error was unrecoverable
2022-06-16T08:57:45+12:00  Driver Failure   failed to launch command with executor: rpc error: code = Unknown desc = file spin not found under path /home/ivan/github/fermyon-installer/local/data/nomad/alloc/bf3b0d08-28e0-bfd8-441d-823f3b613e74/spin
2022-06-16T08:57:15+12:00  Task Setup       Building Task Directory
2022-06-16T08:57:15+12:00  Received         Task received by client
vdice commented

Not sure if this will be helpful at all, but may at least be informative if the behavior is different: It looks like Hippo supports specifying a particular spin binary path.

Could we try adding something like Spin__BinaryPath = "<path to spin on host>" to the Hippo job env, re-run start.sh and see if behavior changes?

@vdice Hmm, interesting! That fails with:

2022-06-16T10:24:43+12:00  Driver Failure   failed to launch command with executor:
rpc error: code = Unknown desc = file /home/ivan/github/spin/target/debug/spin
not found under path /home/ivan/github/fermyon-installer/local/data/nomad/alloc/8a8d6670-ba70-c433-f00f-92d1d84fc4d2/spin
vdice commented

Thanks @itowlson.

It appears that configuration is meant to represent a relative path in the allocation (cc @bacongobbler to check my understanding) and so perhaps not helpful here.

My only other idea is to see if we can try overriding the Nomad:Driver value. Not sure if it is resolving to exec or raw_exec for you (OperatingSystem.IsLinux()). Does setting Nomad__Driver = "raw_exec" help?

I am not sure how to test that given that Hippo is downloaded rather than taken from a local copy - is there something I can set in the installer to force it?

> System.OperatingSystem.IsLinux();;
val it: bool = true

I'm not sure if exec or raw_exec makes a big difference. I tried these two jobs:

job "spin-raw-exec" {
  datacenters = ["dc1"]
  type        = "batch"

  group "spin-raw-exec" {
    task "spin-raw-exec" {
      driver = "raw_exec"
      config {
        command = "spin"
        args    = []
      }
    }
  }
}


job "spin-exec" {
  datacenters = ["dc1"]
  type        = "batch"

  group "spin-exec" {
    task "spin-exec" {
      driver = "exec"
      config {
        command = "spin"
        args    = []
      }
    }
  }
}

and both failed, although with different statuses: exec gave me the "file spin not found" message, raw_exec gave me "Terminated: exit code 2".

Oh! spin without arguments looks like it might return exit code 2. Maybe raw_exec worked and I just looked for logs in the wrong place!

@vdice YES! raw_exec works for my demo case - I just got confused by the output. But it looks like Hippo is sending me exec. Is there a way to override the Hippo setting so I can test this with a real Spin app?

vdice commented

๐ŸŽ‰ Excellent! I wonder if raw_exec is a prereq for WSL -- and if so, if we can conditionailize things so that the installer just works for this case (or, actually, maybe the conditional in Hippo is a better fit ๐Ÿค”).

Anyways, to the task at hand. Yes, it should be an env setting on the hippo job similar to the spin binary path we tried above.

Try adding Nomad__Driver = "raw_exec" to the Hippo job env.

๐ŸŽ‰ IT WORKS ๐ŸŽ‰