zbm-dev/zfsbootmenu

cannot import zroot: no such pool available

geekifan opened this issue · 10 comments

ZFSBootMenu build source

Release EFI

ZFSBootMenu version

2.3.0

Boot environment distribution

Debian 12

Problem description

image
zfsbootmenu cannot import zroot. It can recognize zroot/ROOT/debian when press [ESCAPE] key at boot.

Steps to reproduce

Follow the instructions on the document website with zfslinux-utils/bookworm-backports, zfs-dkms/bookworm-backports and zfs-initramfs/bookworm-backports installed during the installation.

The screen shot you've shown is from the initramfs in your Debian boot environment failing to import the pool, and not ZFSBootMenu. There's a failed device (/dev/sda) that should be investigated as the underlying cause for the pool import failure.

Thanks for your quick reply! I'm not familiar with system boot. I'm not sure why debian cannot recognize the disk /dev/sda. It is a brandnew disk on a brandnew machine. I know it is not your responsibility to help deal with the situation, but I really appreciate it if you could provide any suggestions. :P

I'd recommend looking through the full dmesg output when you're in the busybox shell. You might be able to find something that stands out there. It's possible that this is related to kexec'ing into a new kernel - but this would be the first time we've seen it at the drive controller level.

image
This is the screenshot of dmesg in initramfs. The boot disk is connected to a megaraid lsi card. I also tried to install non free firmware on debian and updated the initramfs, but it doesn't work.

In my experience, MegaRAID cards are very finicky. I suspect it's not able to gracefully handle the kexec process. Do you have any options to use an onboard AHCI controller port?

Sadly :(, I cannot change the disk topology right now. That's to say, i cannot move the disk to an onboard sata port. Does it mean that I cannot use zfsbootmenu on this machine?

It means that it might take a bit more effort to make it work. If you're interested in trying a few things, I can write up a few steps. Is the LSI exposing a single disk, or a RAID volume?

Thanks a lot! I am willing to try. The LSI is now exposing a single disk with JBOD mode, which is the "passthrough" mode of this raid controller.

By the way, I suspect it is a fail-to-reinitialize problem of LSI (but I'm not sure). Maybe I can try to put a new teardown script to unbind this raid controller from megaraid driver like what zbm do for USB controller?

UPDATE 1: I tried to rebind the controller in teardown.d but it doesn't work. The disk is still not recognized even after a manual rebind.
UPDATE 2: I used this script below to reset the controller and it works. But it takes about 2 minutes to reset (so weird) with an ioctl error write error: Inappropriate ioctl for device.

#!/bin/sh
SYS_MEGARAID=/sys/bus/pci/drivers/megaraid_sas

# shellcheck disable=SC2231
for DEVPATH in ${SYS_MEGARAID}/????:??:??.?; do
        [ -L "${DEVPATH}" ] || continue
        DEVICE="${DEVPATH#"${SYS_MEGARAID}"/}"
        echo "Tearing down Megaraid controller ${DEVICE}..."
        echo "${DEVICE}" > ${SYS_MEGARAID}/unbind
        echo "Resetting Megaraid controller ${DEVICE}..."
        echo "1" > /sys/bus/pci/devices/${DEVICE}/reset
done

I don't think there's anything that can be done about the 2 minutes it takes to reset the device, that's just megaraid_sas being a joy to work with.

I've added the script you created to the contrib directory - b7124bb . Thanks!