coreos/rpm-ostree

Support multiple initrd compression formats (non-gzip compression formats currently cause kernel crashes in worst case scenario)

Closed this issue · 13 comments

Host system details

rpm-ostree status
State: idle
Deployments:
● auto-sig:cs9/aarch64/qemu-developer
                  Version: 9 (2023-11-02T10:41:30Z)
                   Commit: ad000b8c336295604353805222eee32a8cbd9dd6123807f490dea4114458e339

Expected

# rpm-ostree compose postprocess

with lz4 configured via:

/usr/lib/dracut/dracut.conf.d

of passed via parameter:

--lz4

--compress -l -2

System boots.

** Actual

# rpm-ostree compose postprocess

with lz4 configured via:

/usr/lib/dracut/dracut.conf.d

of passed via parameter:

--lz4

--compress -l -2

Kernel either displays decoding error on decompression or panics.

You can also see the decoding error via userspace lz4 tool.

Steps forward

I scratched my head for a day or two on this issue. My interpretation is that it boils down to the appending of the:

dracut-random.cpio.gz

data towards the end of rpmostree_run_dracut.

We could do this in dracut, or compress all the cpio data ourselves in rpm-ostree (if we compress ourselves we don't have to dump dracut-random.cpio.gz, dracut-random.cpio.zstd, dracut-random.cpio.lz4, etc. just shell out and pipe to compression tool maybe?).

Encountered in Red Hat/CentOS Stream Automotive distributions as we changed the initramfs compression algorithm to lz4 (it boots faster from our testing) and were encountering non-booting systems, etc.

Not really related to the issue at all, but we started the effort to support more compression formats here:

#3745

The most appropriate fix would seem to be to have dracut handle compression of initrd and the additional random data.

Did we ever teach dracut how to do this as suggested in #1950 @jlebon @cgwalters ? Or is there a dracut ticket opened?

Issue as a one-liner:

$ sudo /bin/bash -c "dracut -f $PWD/initramfs.img --lz4; cat ./src/libpriv/dracut-random.cpio.gz >> $PWD/initramfs.img; lz4 --test $PWD/initramfs.img;"
Stream followed by undecodable data at position 21716923
/root/initramfs.img  : decoded 42743296 bytes

Working version (with gzip):

$ sudo /bin/bash -c "dracut -f $PWD/initramfs.img --gzip; cat ./src/libpriv/dracut-random.cpio.gz >> $PWD/initramfs.img; gzip --test $PWD/initramfs.img;"

dracut-random.cpio.gz should go away, see dracutdevs/dracut#2331

I have a question @cgwalters , does the dracut-random.cpio.gz file provide value in it's current form? Could we just remove that from being appended to the initrd for now? It's not random anyway in it's current form...

I have a question @cgwalters , does the dracut-random.cpio.gz file provide value in it's current form? Could we just remove that from being appended to the initrd for now?

We added it for a reason:

It's not random anyway in it's current form...

The goal isn't that the data there is random...the goal is that it provides the /dev/{u,}random devices in the generated initramfs reliably, even if we're building the initramfs in a container that doesn't have permission to create those devices at build time.

In the short term, we can do two things here:

  • Detect the case where we do have the capability to make the devices and don't append the extra cpio blob
  • Detect the compression format used by dracut and match it

In the short term, we can do two things here:

  • Detect the case where we do have the capability to make the devices and don't append the extra cpio blob
  • Detect the compression format used by dracut and match it

What we could do is:

  1. In the rust code:

Read the first 6 bytes of the initrd, if it is not "gzip" (like in the example shell script from lsinitrd below), set a compressor variable as "lz4 -c", "zstd -c", etc.

  1. Shell out:

gunzip -c dracut-random.cpio.gz | whatever_is_in_compressor_variable

Example of detection from lsinitrd:

if [[ $SKIP ]]; then
    bin="$($SKIP "$image" | { read -r -N 6 bin && echo "$bin"; })"
else
    read -r -N 6 bin < "$image"
fi
case $bin in
    $'\x1f\x8b'*)
        CAT="zcat --"
        ;;
    BZh*)
        CAT="bzcat --"
        ;;
    $'\x71\xc7'* | 070701)
        CAT="cat --"
        ;;
    $'\x02\x21'*)
        CAT="lz4 -d -c"
        ;;
    $'\x89'LZO$'\0'*)
        CAT="lzop -d -c"
        ;;
    $'\x28\xB5\x2F\xFD'*)
        CAT="zstd -d -c"
        ;;
    *)
        if echo "test" | xz | xzcat --single-stream > /dev/null 2>&1; then
            CAT="xzcat --single-stream --"
        else
            CAT="xzcat --"
        fi
        ;;
esac

What do you think? I think we might as well match it, if we have to detect the compression format, the easier part is matching it.

Can you provide the exact error message from the kernel?

I'll paste it here tomorrow, the exact one, off the top of my head it might have been "decoding failed"... That was the good case, getting the kernel log, in that case it would just ignore the unrecognisable gzip data and boot anyway.... In the worse case, you wouldn't even see that log, the kernel would freak out and panic tagging @masneyb for awareness... This is easy to reproduce by the way, just use a lz4 compressed initrd with ostree... This code assumes all initrd's are compressed with gzip, which is actually quite a dated compression algorithm, not sure why we are using that in RHEL or Fedora as default for initrd, seems like we are just using it because it's what's always been used.

In the healthier case, this is the kernel log:

Initramfs unpacking failed: Decoding failed

This is not working, tested or complete:

https://github.com/coreos/rpm-ostree/pull/4702/files

but this is what I'm thinking... It makes sense to shell out here as it's too much maintenance to integrate all the compression/decompression formats in rpm-ostree directly... Easier to just let the command line compressors/decompressors do their job.

I did #4704 which will allow you to opt-out of this for now.

Now,

Detect the case where we do have the capability to make the devices and don't append the extra cpio blob

would be better but needs some grunt work to ensure we're detecting things correctly.

The kernel should already support a series of CPIOs in different formats. The reproducer in #4683 (comment) only says that the user-space lz4 tool complains, but that's different code from what's in the kernel. For example, the latter needs to support arbitrary padding after the end of a compressed stream. (See the grammar at https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt).

I thought at first this was a misalignment issue (see https://github.com/coreos/coreos-assembler/blob/eaaa1d7c5f235d365017c4b1d76a98833f9f411a/src/cmd-buildextend-live#L120-L133), though I don't think that applies here since we're appending a compressed CPIO.

Searching for that error message lead me to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2c484419efc09e7234c667aa72698cb79ba8d8ed which is quite informative. The patch fixes trailing padding handling for the kernel lz4 decompressor by checking if the next lz4 frame where the magic number is expected is all zeros. But for that logic to work, it needs at least 4 bytes of padding.

So... I think all we need is to add some padding between the two?

...time passes...

OK, this should work: #4705