coreos/fedora-coreos-tracker

Trigger a bootupd update before landing latest 6.9 kernel update in Fedora CoreOS

travier opened this issue ยท 18 comments

Describe the bug

We have to make sure everyone gets their bootloader updated before we land the 6.9 kernel in FCOS.

See fedora-silverblue/issue-tracker#543

Reproduction steps

Update to 6.9 kernel.

Expected behavior

System boots with Secure Boot enabled

Actual behavior

It doesn't

System details

N/A

Butane or Ignition config

N/A

Additional information

No response

I tested installing Fedora Silverblue 39 and updating to latest commit which comes with the 6.9 kernel and it failed to boot. It's likely that it will fail as well for FCOS.

We might not see that in the tests as the bootloader there is always up-to-date as it's a fresh installation.

Maybe the upgrade tests will show it.

Note: I won't be there for the meeting today

Just to sanity-check, as expected I can confirm this also affects FCOS. Booting from an f38 image and rebasing to testing-devel (which already has kernel 6.9):

error: ../../grub-core/kern/efi/sb.c:182:bad shim signature.
error: ../../grub-core/loader/i386/efi/linux.c:258:you need to load the kernel
first.

Maybe the upgrade tests will show it.

The last few Secure Boot upgrade tests are currently failing, but on what seems to be an unrelated issue. It needs to be looked at. (Or maybe that is what it's failing on; the console logs appear truncated so it's hard to tell.)

We did force a bootloader update recently-ish, but only on aarch64. And even then, it's not clear whether it addresses this (when did the fixed e.g. shim/grub enter Fedora 39?).

We did force a bootloader update recently-ish, but only on aarch64.

For reference, this is the PR where we did this: coreos/fedora-coreos-config#2308.

And it looks like systemd supports ConditionSecurity=uefi-secureboot. So we could revive that unit but conditionalize on just systems that have Secure Boot to lower its risk.

This was discussed today in the community meeting and the following plan was decided :

We will also use the opportunity of this barrier release to fix the aleph issue mentionned above, as this needs fixing to be able to update the bootloader anyway.

See the meeting logs for more details : https://meetbot.fedoraproject.org/meeting-1_matrix_fedoraproject-org/2024-06-26/fedora-coreos-meeting.2024-06-26-16.30.log.html

Should we do some special sauce to detect RAID setups that we currently don't support in bootupd?

Pr to pin kernel 6.8 in testing-devel : coreos/fedora-coreos-config#3041

Should we do some special sauce to detect RAID setups that we currently don't support in bootupd?

looking at #1485 (comment)
I am not sure how i can write a script that would reliably find the correct partitions labels

PR with the bootloader update (and aleph fix) : coreos/fedora-coreos-config#3042

Should we do some special sauce to detect RAID setups that we currently don't support in bootupd?

Yes, good point. I think we should for completeness.

some special sauce to detect RAID setups that we currently don't support in bootupd?

@jlebon @travier is /dev/disk/by-label/esp-1 a reliable label on raid setups ?

That's a good question and I don't know the answer. I think we'll have to provision an FCOS system with various RAID setups and look at the device configurations.

Added more info re. RAID in coreos/fedora-coreos-config#3042 (comment).

Yes, those labels are reliable. The only RAID1 we can try to support is the one we setup ourselves via the mirror Butane sugar. Those labels are defined there: https://github.com/coreos/butane/blob/d26d80317825a24f482d9c6cca2fa80181e0082f/config/fcos/v1_3/translate.go#L165

The fix for this went into testing stream release 40.20240701.1.0. Please try out the new release and report issues.

With the revert done and the fix landed in testing, I think we can close this one now.

The fix for this went into stable stream release 40.20240701.3.0.