zbm-dev/zfsbootmenu

Install on Odroid failing silently

techiebod opened this issue · 9 comments

I've installed ZFSBootMenu on an Odroid (https://www.odroid.co.uk/H3-Plus) and whilst the boot menu comes up fine and I can see and interact (as in use the recovery or chroot shells) fine, trying to boot a kernel displays a message about tearing down USB, and then reboots.

The menu screen is the same as my other machines showing the correct root dataset and showing the right kernel. I've gone into the recovery shell and can see the zpool status is fine, and I can boot from a USB drive and everything on the pool works (I can write it, etc.)

I'm using Ubuntu 23.04 with a 6.2.0-20 kernel. The kernel should support ZFS (although I'd have expected it to at least boot further even if it didn't!).

How do I go about debugging this? I've tried increasing the loglevel on the kernel commandline (ctrl-E) but I'm at a loss as to what else I can do!

I have a suspicion that the kernel doesn't support kexec. The most straightforward test would be to drop to an emergency shell with Ctrl+R, mount_zfs pool/path/to/your/root/filesystem, change to the boot subdirectory of the path that it reports, load the kernel with

kexec -a -l ./your-kernel-here --initrd=./your-initramfs-here --cmdline="root=zfs:pool/path/to/your/root/filesystem other args here"

and finally try to jump into the kernel with kexec -e -i. Watch for any reported errors to see if anything useful is printed.

Assuming you mean "--command-line=..." as --cmdline the load works fine, but the execute doesn't work, that does the same thing as before, as in it simply reboots the machine :(

I've tried again with a USB stick rather than the mmcblk device. I created this on another machine, and rebuilt as I've done for that one. Again ZBM loads fine, imports the pool and finds the kernel. However on attempting to boot (I assume this is done as a kexec as per doing the zkexec in the recovery shell) it comments on the USB teardown, and then returns to the "bios" screen after a few seconds. Is there anything else I can try? Without any error messages I'm struggling to think of new things to try :(

The build by the way is very simple following the ZBM Ubuntu instructions, so it's a 2 partition system, 1GB (I'm only on a 32GB system so don't want to use the full 2GB if possible) fat partition (mkfs.vfat -F32) and then a single device/vdev zpool created using the advised parameters. I then go through the debootstrap process, and then chroot (actually I've been using systemd-nspawn of late, but I've tried with the standard chroot as well in case that was an issue) and install the kernel and few other utilities. This feels like a specific issue to this hardware, but I don't know what to try and look for.

Upgraded the bios in case that would help, from 1.12 to 1.15 in case that'd help. No change though in the symptoms that I can see.

Unfortunately, I think this is probably some hardward incompatibility. It might be worth trying kexec from within your boot environment (or a live environment) to see if the result is the same.

Ok, I'm up for testing that, but can you give me some instructions on how? Also what in the hardware obstructs this? Maybe there is a setting I can change?

I've tried effectively the same commands as above (installed kexec-tools, then ran the kexec commands above minus the z) on a booted instance of Ubuntu 23.04 with a 6.2.0-27 kernel and got exactly the same result. The kexec -e command is effectively an abrupt reboot to the bios loading screen

Yeah, this sounds like some hardware that can't handle the kernel handoff and just resets the whole system. It might be possible to poke around with nodes in /sys to try to detach things before the kexec (like we do with XHCI teardown), but you'll just be feeling around in the dark. Another option might be to try blacklisting any kernel module you find in the output of lsmod that isn't absolutely critical for ZFSBootMenu functionality, but again, this will be a tedious trial-and-error effort that is not guaranteed to succeed.

If you do work out a viable solution to this issue, please feel free to note it here. We can then either adapt the solution to be a permanent part of ZFSBootMenu, or at the least officially document it. Barring that, there's nothing else actionable here for us.