arighi/virtme-ng

Slow boot time

Opened this issue ยท 13 comments

Hello, this project speeds up my develope projects alot. However, I couldn't be able to achieve the same boot time as shown in presentation.

$ time vng -m 16G -- uname -a
Linux virtme-ng 6.10.0-virtme 
#8 SMP PREEMPT_DYNAMIC Thu Aug  1 12:01:12 CST 2024 x86_64 GNU/Linux
vng -m 16G -- uname -a  21.04s user 4.41s system 260% cpu 9.755 total

I am using an Intel i9-12900H, with virtiofsd support and virtme-ng-init. Are there any place I could improve on? Attached is the full bootlog.

kern.log

Thanks.

Hello,

Could it be possible KVM is not being used? Do you launch a nested VM (launching vng from a VM)?

From the vng VM, can you check the output of this command please?

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

Could it be possible KVM is not being used?

Wow, nice catch! Turns out I only gave non-root libvirt access, not kvm.. This link has the command to fix it.

Do you launch a nested VM (launching vng from a VM)?

No, I am not launching a nested VM.

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

hpet

Now it has much faster boot time of 4.76s !

kuro@tora ~/Repositio/net (git)-[orphan] % time vng -- cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock
vng -- cat /sys/devices/system/clocksource/clocksource0/current_clocksource  4.76s user 1.34s system 25% cpu 24.378 total

However, I happened to notice that it takes much longer to close the VM than non-kvm, does this sounds more like a me problem or virtme-ng problem?

kuro@virtme-ng ~/Repositio/net (git)-[orphan] %
[   73.091002] ACPI: PM: Preparing to enter system sleep state S5
[   73.091977] kvm: exiting hardware virtualization
[   73.093061] reboot: Power down

Could it be possible KVM is not being used?

Wow, nice catch! Turns out I only gave non-root libvirt access, not kvm.. This link has the command to fix it.

๐Ÿ‘

However, I happened to notice that it takes much longer to close the VM than non-kvm, does this sounds more like a me problem or virtme-ng problem?

kuro@virtme-ng ~/Repositio/net (git)-[orphan] %
[   73.091002] ACPI: PM: Preparing to enter system sleep state S5
[   73.091977] kvm: exiting hardware virtualization
[   73.093061] reboot: Power down

It might be easier with the full log. Here, it looks like it took 0.002 sec, that seems quick enough to me ;)

Here is the full log, the terminal hangs after the last reboot: Power down line, so the log file itself cannot tell the time it hangs. I tried to count with a stop watch, and it stops for around 15 secs.

kern.log

Strange. Which version of virtme-ng are you using? The latest one?

On which GNU/Linux distribution and kernel?

Did you build your own kernel? How?

In the logs, we can see a big gap where something seems to be blocked:

[    0.760541] virtiofs virtio0: virtio_fs_setup_dax: No cache capability
[    0.799498] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.801667] 00:00: ttyS0 at I/O 0x3f8 (irq = 56, base_baud = 115200) is a 16550A
[    8.066710] Non-volatile memory driver v1.3
[    8.066977] Linux agpgart interface v0.103
[    8.067225] ACPI: bus type drm_connector registered
[    8.073716] loop: module loaded
[    8.074190] i8042: PNP: No PS/2 controller found.
[    8.236195] rtc_cmos 00:01: registered as rtc0

Maybe the issue comes from there?

Also, we can see console=ttyS0: I'm surprised it is not console=hvc0.
(I don't think it is linked to that, but maybe try without the microvm arch: --disable-microvm ; and maybe without virtiofs: --force-9p)

It would be also helpful to add -a initcall_debug so we can see if there's a particular initcall that takes longer than expected.

Thanks for investigating with me!

Strange. Which version of virtme-ng are you using? The latest one?

The latest one, v1.27

On which GNU/Linux distribution and kernel?

Linux tora 6.1.0-23-rt-amd64 #1 SMP PREEMPT_RT Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux

Did you build your own kernel? How?

I did not build my own kernel, this is from debian.

Maybe the issue comes from there?

With advice from arighi, this is the output with -a initcall_debug added. Sadly there ain't any output for the shutdown freeze part.

kern.log

maybe try without the microvm arch: --disable-microvm

hmm, disabling microvm made the gap worse, and also didn't make the shutdown freeze gone. Here is the gap part of the log.

$ time vng --disable-microvm -v -a initcall_debug -- echo "booted, now goodbye"
...
[    0.991674] calling  virtio_console_init+0x0/0xe0 @ 1
[    4.143360] probe of virtio1 returned 0 after 3151275 usecs
[    7.254900] probe of virtio2 returned 0 after 3108011 usecs
[   10.343543] probe of virtio3 returned 0 after 3085840 usecs
[   13.450783] probe of virtio4 returned 0 after 3104146 usecs
[   16.543581] probe of virtio5 returned 0 after 3090482 usecs
[   19.631632] probe of virtio6 returned 0 after 3084937 usecs
[   19.634603] initcall virtio_console_init+0x0/0xe0 returned 0 after 18642526 usecs
[   19.636123] calling  hpet_init+0x0/0xa0 @ 1
[   19.636525] probe of PNP0103:00 returned 19 after 24 usecs
...

maybe without virtiofs: --force-9p

Interesting, this has the same effect of prolonging the virtio_console_init to 19 secs.

$ time vng --force-9p -v -a initcall_debug -- echo "booted, now goodbye"
[    0.825067] calling  virtio_console_init+0x0/0xe0 @ 1
[    3.936679] probe of virtio1 returned 0 after 3111186 usecs
[    7.048912] probe of virtio2 returned 0 after 3109086 usecs
[   10.164953] probe of virtio3 returned 0 after 3112608 usecs
[   13.276726] probe of virtio4 returned 0 after 3107867 usecs
[   16.373049] probe of virtio5 returned 0 after 3093183 usecs
[   19.456923] probe of virtio6 returned 0 after 3080970 usecs
[   19.460197] initcall virtio_console_init+0x0/0xe0 returned 0 after 18634711 usecs
[   19.462795] calling  hpet_init+0x0/0xa0 @ 1
[   19.464443] probe of PNP0103:00 returned 19 after 54 usecs
[   19.465939] initcall hpet_init+0x0/0xa0 returned 0 after 1835 usecs

Thank you for the output.

I did not build my own kernel, this is from debian.

Sorry, I meant the one you are running in the VM: how did you build it? Just with vng --build and no additional kernel config?

Do you have the same delays when you load the same kernel as your system? โ†’ vng -r.

I don't think it could be due to something missing in the host kernel or in QEmu (I think newer versions are available in the backports repo), but I don't exactly know what's causing the delays.

Just to be sure, do you have the same delay without -v -- not to add the console= boot parameter -- or without running a command -- to use console=hvc0 instead of ttyS0 โ†’ so without -- <cmd>?

how did you build it? Just with vng --build and no additional kernel config?

That is correct, just vng -b. The kernel is the main branch from https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git

vng -r

Well, it's even worse. It stucked, as of this writing, it has not finish loading.

kern-stuck.log

do you have the same delay without -v

Yes the delay exists w/ or w/o -v, and w/ or w/o -- <โŒ˜>

I do have iommu enabled in my grub config, intel_iommu=on iommu=pt. Not sure if this is related. Edit: disabling them did not help :(

how did you build it? Just with vng --build and no additional kernel config?

That is correct, just vng -b. The kernel is the main branch from https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git

I'm also on top of this tree (netdev/net -- the one from DaveM should be a symlink to this one), and I don't have such issues.

I'm using virtme from a docker, with a more recent version of QEmu. If you want to try, that should be easy:

cd <kernel source code>
docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" -v "${PWD}/.home:/root:rw" --rm \
  -it --privileged --pull always mptcp/mptcp-upstream-virtme-docker:latest \
  manual-normal

https://github.com/multipath-tcp/mptcp-upstream-virtme-docker

It will generate files under a new .virtme directory, easy to clean after. You might have to clean your env first: make mrproper.

vng -r

Well, it's even worse. It stucked, as of this writing, it has not finish loading.

kern-stuck.log

do you have the same delay without -v

Yes the delay exists w/ or w/o -v, and w/ or w/o -- <โŒ˜>

I do have iommu enabled in my grub config, intel_iommu=on iommu=pt. Not sure if this is related. Edit: disabling them did not help :(

Sorry, I'm not sure what's wrong exactly. Maybe @arighi has other ideas? :)

I would suggest running vng with a debugger (vng (...) --debug on one side, and vng (...) --gdb on the other side), and checking where and why it is blocking, but hard to give a lot of help remotely.

@kuroa-me does it make any difference for the shutdown part with --no-virtme-ng-init?

Edit: can you try also -a "reboot=t"?

@kuroa-me any news on this? thanks!

Sorry for responding so late...

Sadly, both --no-virtme-ng-init and -a "reboot=t" does not help.

Meanwhile, I have tried to trace kvm and see if there are any clue. However, due to my limited knowledge about kvm, I could not find any. Attached is a trace about kvm while trying to shutdown vng. Maybe you could see any weird thing that is going on? Thank you.

trace-cmd.txt.zip

Thanks again for all the supports!

But feel free to close this issue if you want, since at this point, I feel that this is more of a kvm issue rather than a vng one.