Spin jobs not working
itowlson opened this issue ยท 10 comments
I can now bring up Nomad and Hippo, but when I do a spin deploy
, the application (the Spin job goes into Unhealthy.
The status information for a typical Spin job is:
ivan@hecate:~$ nomad status
ID Type Priority Status Submit Date
a0e306fa-46e1-40ac-b42f-0033e284e102 service 50 dead 2022-06-16T08:55:59+12:00
bindle service 50 running 2022-06-16T07:53:53+12:00
hippo service 50 running 2022-06-16T07:54:23+12:00
traefik service 50 running 2022-06-16T07:53:27+12:00
ivan@hecate:~$ nomad status a0e306fa-46e1-40ac-b42f-0033e284e102
ID = a0e306fa-46e1-40ac-b42f-0033e284e102
Name = a0e306fa-46e1-40ac-b42f-0033e284e102
Submit Date = 2022-06-16T08:55:59+12:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = pending
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost Unknown
a0e306fa-46e1-40ac-b42f-0033e284e102 0 0 0 2 0 0 0
Future Rescheduling Attempts
Task Group Eval ID Eval Time
a0e306fa-46e1-40ac-b42f-0033e284e102 ddfbadc1 49s from now
Latest Deployment
ID = fedcd3c5
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
a0e306fa-46e1-40ac-b42f-0033e284e102 1 2 0 2 2022-06-16T09:05:59+12:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
bf3b0d08 f2e113e0 a0e306fa-46e1-40ac-b42f-0033e284e102 0 run failed 41s ago 7s ago
6c48681e f2e113e0 a0e306fa-46e1-40ac-b42f-0033e284e102 0 stop failed 1m58s ago 41s ago
ivan@hecate:~$ nomad status bf3b0d08
ID = bf3b0d08-28e0-bfd8-441d-823f3b613e74
Eval ID = 37a1748d
Name = a0e306fa-46e1-40ac-b42f-0033e284e102.a0e306fa-46e1-40ac-b42f-0033e284e102[0]
Node ID = f2e113e0
Node Name = hecate
Job ID = a0e306fa-46e1-40ac-b42f-0033e284e102
Job Version = 0
Client Status = failed
Client Description = Failed tasks
Desired Status = run
Desired Description = <none>
Created = 59s ago
Modified = 25s ago
Deployment ID = fedcd3c5
Deployment Health = unhealthy
Reschedule Eligibility = 31s from now
Allocation Addresses
Label Dynamic Address
*http yes 127.0.0.1:23045
Task "spin" is "dead"
Task Resources
CPU Memory Disk Addresses
100 MHz 300 MiB 300 MiB
Task Events:
Started At = N/A
Finished At = 2022-06-15T20:57:45Z
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2022-06-16T08:57:47+12:00 Killing Sent interrupt. Waiting 5s before force killing
2022-06-16T08:57:45+12:00 Alloc Unhealthy Unhealthy because of failed task
2022-06-16T08:57:45+12:00 Not Restarting Error was unrecoverable
2022-06-16T08:57:45+12:00 Driver Failure failed to launch command with executor: rpc error: code = Unknown desc = file spin not found under path /home/ivan/github/fermyon-installer/local/data/nomad/alloc/bf3b0d08-28e0-bfd8-441d-823f3b613e74/spin
2022-06-16T08:57:15+12:00 Task Setup Building Task Directory
2022-06-16T08:57:15+12:00 Received Task received by client
Not sure if this will be helpful at all, but may at least be informative if the behavior is different: It looks like Hippo supports specifying a particular spin binary path.
Could we try adding something like Spin__BinaryPath = "<path to spin on host>"
to the Hippo job env, re-run start.sh
and see if behavior changes?
@vdice Hmm, interesting! That fails with:
2022-06-16T10:24:43+12:00 Driver Failure failed to launch command with executor:
rpc error: code = Unknown desc = file /home/ivan/github/spin/target/debug/spin
not found under path /home/ivan/github/fermyon-installer/local/data/nomad/alloc/8a8d6670-ba70-c433-f00f-92d1d84fc4d2/spin
Thanks @itowlson.
It appears that configuration is meant to represent a relative path in the allocation (cc @bacongobbler to check my understanding) and so perhaps not helpful here.
My only other idea is to see if we can try overriding the Nomad:Driver value. Not sure if it is resolving to exec
or raw_exec
for you (OperatingSystem.IsLinux()
). Does setting Nomad__Driver = "raw_exec"
help?
I am not sure how to test that given that Hippo is downloaded rather than taken from a local copy - is there something I can set in the installer to force it?
> System.OperatingSystem.IsLinux();;
val it: bool = true
I'm not sure if exec
or raw_exec
makes a big difference. I tried these two jobs:
job "spin-raw-exec" {
datacenters = ["dc1"]
type = "batch"
group "spin-raw-exec" {
task "spin-raw-exec" {
driver = "raw_exec"
config {
command = "spin"
args = []
}
}
}
}
job "spin-exec" {
datacenters = ["dc1"]
type = "batch"
group "spin-exec" {
task "spin-exec" {
driver = "exec"
config {
command = "spin"
args = []
}
}
}
}
and both failed, although with different statuses: exec
gave me the "file spin not found" message, raw_exec
gave me "Terminated: exit code 2".
Oh! spin
without arguments looks like it might return exit code 2. Maybe raw_exec
worked and I just looked for logs in the wrong place!
@vdice YES! raw_exec
works for my demo case - I just got confused by the output. But it looks like Hippo is sending me exec
. Is there a way to override the Hippo setting so I can test this with a real Spin app?
๐ Excellent! I wonder if raw_exec
is a prereq for WSL -- and if so, if we can conditionailize things so that the installer just works for this case (or, actually, maybe the conditional in Hippo is a better fit ๐ค).
Anyways, to the task at hand. Yes, it should be an env setting on the hippo job similar to the spin binary path we tried above.
Try adding Nomad__Driver = "raw_exec"
to the Hippo job env.
๐ IT WORKS ๐