kubevirt vm starts, does nothing, then stops
Opened this issue · 3 comments
According to the runner-set, it starts the launcher-runner, which i can watch it start up (kubevirt vnc), i see that it automatically logs in as root, waits for a few seconds and then shuts down again.
The runner never goes online in the github runner list, and the workflow job never starts.
These are the output logs of the compute
container from the kubevirt launcher-runner pod:
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"resumed\" detail=\"unpaused\" with event id 4 reason 0 received","pos":"client.go:470","timestamp":"2024-10-31T15:38:44.742760Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"started\" detail=\"booted\" with event id 2 reason 0 received","pos":"client.go:470","timestamp":"2024-10-31T15:38:44.745683Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain started.","name":"runner","namespace":"kvrunner","pos":"manager.go:1250","timestamp":"2024-10-31T15:38:44.748068Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"runner","namespace":"kvrunner","pos":"server.go:208","timestamp":"2024-10-31T15:38:44.751023Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(1)","pos":"client.go:297","timestamp":"2024-10-31T15:38:44.751189Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: kvrunner_runner","pos":"client.go:424","timestamp":"2024-10-31T15:38:44.752677Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(1)","pos":"client.go:297","timestamp":"2024-10-31T15:38:44.755771Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: kvrunner_runner","pos":"client.go:424","timestamp":"2024-10-31T15:38:44.757298Z"}
{"component":"virt-launcher","level":"info","msg":"Found PID for kvrunner_runner: 79","pos":"monitor.go:170","timestamp":"2024-10-31T15:38:45.101022Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"runner","namespace":"kvrunner","pos":"server.go:208","timestamp":"2024-10-31T15:38:45.326137Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"runner","namespace":"kvrunner","pos":"server.go:208","timestamp":"2024-10-31T15:38:45.347497Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"shutdown\" detail=\"unknown\" with event id 6 reason 1 received","pos":"client.go:470","timestamp":"2024-10-31T15:39:10.614443Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: ShuttingDown(4):Unknown(0)","pos":"client.go:297","timestamp":"2024-10-31T15:39:10.616872Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: kvrunner_runner","pos":"client.go:424","timestamp":"2024-10-31T15:39:10.618667Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"runner","namespace":"kvrunner","pos":"server.go:208","timestamp":"2024-10-31T15:39:10.623713Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 79 with status 0","pos":"virt-launcher-monitor.go:198","timestamp":"2024-10-31T15:39:10.775544Z"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"stopped\" detail=\"shutdown\" with event id 5 reason 0 received","pos":"client.go:470","timestamp":"2024-10-31T15:39:10.842640Z"}
{"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Shutoff(5):Shutdown(1)","pos":"client.go:297","timestamp":"2024-10-31T15:39:10.844993Z"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: kvrunner_runner","pos":"client.go:424","timestamp":"2024-10-31T15:39:10.846500Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Domain undefined.","name":"runner","namespace":"kvrunner","pos":"manager.go:1874","timestamp":"2024-10-31T15:39:10.883760Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"undefined\" detail=\"removed\" with event id 1 reason 0 received","pos":"client.go:470","timestamp":"2024-10-31T15:39:10.883852Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Signaled vmi deletion","name":"runner","namespace":"kvrunner","pos":"server.go:363","timestamp":"2024-10-31T15:39:10.883850Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"Domain name event: ","pos":"client.go:424","timestamp":"2024-10-31T15:39:10.884890Z"}
{"component":"virt-launcher","level":"info","msg":"Received signal terminated","pos":"virt-launcher.go:473","timestamp":"2024-10-31T15:39:10.974539Z"}
{"component":"virt-launcher","kind":"VirtualMachineInstance","level":"info","msg":"Signaled graceful shutdown","name":"runner","namespace":"kvrunner","pos":"virt-launcher.go:443","timestamp":"2024-10-31T15:39:10.974734Z","uid":"0db541c5-6908-43e0-8465-4a7f446e750a"}
{"component":"virt-launcher","level":"info","msg":"Process kvrunner_runner and pid 79 is gone!","pos":"monitor.go:179","timestamp":"2024-10-31T15:39:11.101598Z"}
{"component":"virt-launcher","level":"info","msg":"Waiting on final notifications to be sent to virt-handler.","pos":"virt-launcher.go:281","timestamp":"2024-10-31T15:39:11.101657Z"}
{"component":"virt-launcher","level":"info","msg":"Final Delete notification sent","pos":"virt-launcher.go:296","timestamp":"2024-10-31T15:39:11.101676Z"}
{"component":"virt-launcher","level":"info","msg":"stopping cmd server","pos":"server.go:608","timestamp":"2024-10-31T15:39:11.101749Z"}
{"component":"virt-launcher","level":"info","msg":"cmd server stopped","pos":"server.go:617","timestamp":"2024-10-31T15:39:11.101924Z"}
{"component":"virt-launcher","level":"info","msg":"Exiting...","pos":"virt-launcher.go:512","timestamp":"2024-10-31T15:39:11.101975Z"}
{"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 12 with status 0","pos":"virt-launcher-monitor.go:198","timestamp":"2024-10-31T15:39:11.108537Z"}
The logs seem to indicate event id 6 reason 1 received
.
The VM in use by the template is using image: ghcr.io/zhaofengli/sample-vm-container-disk:latest
.
Everything else seems to be doing as it should:
The listener detects a job that needs a runner
The listener creates the runner-set
The runner-set creates the launcher-runner but the agent never goes online with github, and the vm just seems to start, log in as root, and then does nothing.
Is the sample-vm-container-disk somehow incomplete?
I'm not sure what i'm missing.
@zhaofengli fyi
Same here, these are some logging messages
$ kubectl logs -n test-arc-repo-runners -c guest-console-log virt-launcher-runner-tgmlp -f
[ 1.546059] sgx: There are zero EPC sections.
<<< NixOS Stage 1 >>>
loading module virtio_balloon...
loading module virtio_console...
loading module virtio_rng...
loading module dm_mod...
running udev...
Starting systemd-udevd version 254.3
kbd_mode: KDSKBMODE: Inappropriate ioctl for device
starting device mapper and LVM...
checking /dev/disk/by-label/nixos...
fsck (busybox 1.36.1)
[fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/disk/by-label/nixos
nixos: clean, 114161/509040 files, 435699/2034432 blocks
mounting /dev/disk/by-label/nixos on /...
<<< NixOS Stage 2 >>>
running activation script...
setting up /etc...
starting systemd...
Welcome to NixOS 23.11 (Tapir)!
[ OK ] Created slice Slice /system/getty.
[ OK ] Created slice Slice /system/modprobe.
[ OK ] Created slice Slice /system/serial-getty.
[ OK ] Created slice User and Session Slice.
...
<<< Welcome to NixOS 23.11.20231117.c757e9b (x86_64) - ttyS0 >>>
Run 'nixos-help' for the NixOS manual.
runner login: root (automatic login)
[root@runner:~]# Stopping Session 1 of User root...
Stopping Session 2 of User root...
[ OK ] Removed slice Slice /system/modprobe.
[ OK ] Stopped target Multi-User System.
[ OK ] Stopped target Login Prompts.
[ OK ] Stopped target Containers.
[ OK ] Stopped target Host and Network Name Lookups.
[ OK ] Stopped target Timer Units.
...
Unmounting /run/keys...
Unmounting run-wrappers.mount...
Unmounting /runner-info...
[ OK ] Stopped Grow Root File System.
[ OK ] Stopped growpart.service.
[ OK ] Unmounted /run/keys.
[ OK ] Unmounted run-wrappers.mount.
[ OK ] Unmounted /runner-info.
[ OK ] Stopped target Preparation for Local File Systems.
[ OK ] Stopped target Swaps.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped Remount Root and Kernel File Systems.
[ OK ] Stopped Create Static Device Nodes in /dev.
[ OK ] Stopped Create Static Device Nodes in /dev gracefully.
[ OK ] Reached target System Shutdown.
[ OK ] Reached target Late Shutdown Services.
[ OK ] Finished System Power Off.
[ OK ] Reached target System Power Off.
[ 20.854376] reboot: Power down
Apparently the StartPre Service requires the legacy application and the runner-info
only contains jitconfig
information, the solution could be to use the just-in-time syntax. I have used the following template to pass the tests:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-template
spec:
runStrategy: Manual
template:
metadata:
name: runner
spec:
architecture: amd64
terminationGracePeriodSeconds: 30
domain:
devices:
filesystems:
- name: runner-info
virtiofs: {}
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
interfaces:
- name: default
masquerade: {}
cpu:
cores: 3
resources:
requests:
memory: 14Gi
networks:
- name: default
pod: {}
volumes:
- name: containerdisk
containerDisk:
image: quay.io/containerdisks/fedora:latest
- name: cloudinitdisk
cloudInitNoCloud:
userData: |-
#cloud-config
users:
- name: runner
homedir: /home/runner
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
mounts:
- [ runner-info, /runner-info/, virtiofs, "rw,relatime,user=fedora" ]
packages:
- jq
bootcmd:
- "sudo mkdir /opt/runner"
- "curl -sL https://github.com/actions/runner/releases/download/v2.320.0/actions-runner-linux-x64-2.320.0.tar.gz | sudo tar -xz -C /opt/runner"
- "sudo /opt/runner/bin/installdependencies.sh"
runcmd:
- "sudo chown -R runner: /opt/runner"
- "sudo runuser -l runner -c '/opt/runner/run.sh --jitconfig $(jq -r '.jitconfig' /runner-info/runner-info.json)'"
- "sudo poweroff"