Dracut fails to boot with Clevis 20
MrRoy opened this issue ยท 12 comments
When I rebuild my initramfs with Clevis 20, my system is unable to boot. Though strangely Dracut is able to unlock my LUKS partition, but fails to boot after unlocking it:
Unlocked /dev/nvme0n1p2 (UUID=...) successfully
/lib/dracut-lib.sh: line 147: CMDLINE_PROC: unbound variable
/lib/dracut-lib.sh: line 198: _newoption: unbound variable
dracut Warning: Signal caught!
/lib/dracut-lib.sh: line 147: CMDLINE_PROC: unbound variable
/lib/dracut-lib.sh: line 198: _newoption: unbound variable
/lib/dracut-lib.sh: line 147: CMDLINE_PROC: unbound variable
/lib/dracut-lib.sh: line 198: _newoption: unbound variable
/lib/dracut-lib.sh: line 913: DRACUT_SYSTEMD: unbound variable
[ 4.847995] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[ 4.847995] CPU: 3 PID: 1 Comm: init Not tainted 6.6.16 #1
[ 4.847991] Hardware name: LENOVO 21D4CT01WW/21D4CT01ww, BIOS N3GET66W (1.66 ) 02/02/2024
[ 4.848009] Call Trace:
[ 4.848020] <TASK>
[ 4.848030] dump_stack_lvl+0x32/0x50
[ 4.848048] panic+0x172/0x310
[ 4.848067] do_exit+0x85f/0x9a0
[ 4.848080] ? srso_alias_return_thunk+0x39/0x90
[ 4.848095] ? __count_memcg_events+0x39/0x90
[ 4.848111] do_group_exit+0x28/0x80
[ 4.848123] __x64_sys_exit_group+0xf/0x10
If I recreate my initramfs using Clevis 19, using the same dracut version, parameters, kernel and cmdline, this does not happen and my system boots successfully. Likewise using dracut without clevis also boots successfully.
OS: Gentoo (OpenRC)
Kernel: 6.6.16
Dracut: 060 (commit 4980bad34775da715a2639b736cba5e65a8a2604)
N.B. on Clevis 19, I apply the patch from PR #347 in order to get Clevis to work without systemd
Would you please provide some steps so I can try reproducing the issue? Thanks in advance.
I faced exactly the same issue. It seems to be coming from dracut/modules.d/99base/dracut-lib.sh and triggered by afe91eb.
I initially faced this issue when trying to use ZFSBootMenu + Clevis v20 on Fedora. Then I tried applying afe91eb and cfefdde to v19 on Debian and got exactly the same issue.
I think set -eu
is too strict and causes this error. I tried to change set -eu
to set -e
and it fixed the problem for me.
I will try to create some minimal setup to reproduce the issue and share it here.
I will try to create some minimal setup to reproduce the issue and share it here.
Any news on this front, so I can try to investigate further?
We can probably relax that set -eu
, but at a first sight, it seems like something that should be addressed in dracut.
On my side I tried to patch clevis 20 to use only set -e
but I still couldn't boot (different error though).
When I have some time, I will try to setup a minimal alpine system to see if I can't get you reproducible instructions @sergio-correia or maybe even share with you a VM with the problem.
Sorry for the delay. This gist provides a minimal setup of ZFSBootMenu + Dracut + Clevis to reproduce the issue.
For simplicity, you can use the following steps:
-
Install QEMU, Docker, OVMF.
-
Build a Docker image with ZFSBootMenu build environment:
curl https://gist.githubusercontent.com/BohdanTkachenko/98e6c2aa8b923a73948a185af0d3accb/raw/Dockerfile \
| docker build . -f - -t zbm-fedora
- Use this Docker image to build an actual ZFSBootMenu EFI and run it in QEMU. The following command should work on Fedora, but you might need to adjust it for other distros:
bash <(curl -s https://gist.githubusercontent.com/BohdanTkachenko/98e6c2aa8b923a73948a185af0d3accb/raw/build-and-run.sh)
Just from checking dracut
modules in /usr/lib/dracut/modules.d
it looks like when you use set -e
, you have to use also set +e
afterwards (see 30convertfs/convertfs.sh
). So if the clevis module calls set -eu
, it should probably call set +eu
at the end too. Or better use a subshell to have this setting only temporary like in 98syslog/rsyslogd-start.sh
.
Just from checking
dracut
modules in/usr/lib/dracut/modules.d
it looks like when you useset -e
, you have to use alsoset +e
afterwards (see30convertfs/convertfs.sh
). So if the clevis module callsset -eu
, it should probably callset +eu
at the end too. Or better use a subshell to have this setting only temporary like in98syslog/rsyslogd-start.sh
.
I think this might work. I will do some testing with the reproducer from @BohdanTkachenko (thanks, by the way!)
Dracut sources all hook files, it does not execute them, so any changes made by the hooks are visible to all other Dracut scripts. To fix this, it should be sufficient to remove set -eu
from the Dracut hook (i.e. clevis-hook.sh
). The unlocking script (clevis-luks-unlocker
) is then executed in a separate environment and setting set -eu
there should be safe.
Clevis v20 unlocking with Dracut without SystemD completely ignores /etc/crypttab
and other options supplied via host-only and kernel command-line, so I reworked the unlocking in #462 (work is done in commits 2 and 3). The unlocking now uses a pipe to send password to cryptsetup. Feel free to try it.
Latest Debian 11 (bullseye), 12 (bookworm) and Fedora v39, v40 and v41 packages are available here https://github.com/oldium/clevis/releases/tag/v21_tpm1u2.
Latest Debian 11 (bullseye), 12 (bookworm) and Fedora v39, v40 and v41 packages are available here https://github.com/oldium/clevis/releases/tag/v21_tpm1u3.
This version includes also latest PKCS#11 updates from master.