coreos/rpm-ostree

Automotive Stream Distribution builds failing

Closed this issue · 22 comments

Describe the bug

Automotive Stream Distribution fails to build since recent changes to ostree/rpm-ostree:

From @juanje:

works: rpm-ostree-2024.3-1
doesn't work: rpm-ostree-2024.4-2

well, the one that works is: ostree-2024.4-3
And the one in the image that doesn't work is: ostree-2024.5-2

Reproduction steps

git clone https://gitlab.com/CentOS/automotive/sample-images.git
sudo sample-images/auto-image-builder.sh cs9-qemu-minimal-ostree.x86_64.qcow2

Expected behavior

cs9-qemu-minimal-ostree.x86_64.qcow2 artefact should build without issue

Actual behavior

We see this, linked issue ostreedev/ostree#3217:

error: Postprocessing and committing: Finalizing rootfs: During kernel processing: renaming boot: unlinkat(boot): Directory not empty
Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.ostree.commit", line 124, in <module>
    r = main(args["inputs"],
  File "/run/osbuild/bin/org.osbuild.ostree.commit", line 111, in main
    subprocess.run(argv,
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['rpm-ostree', 'compose', 'commit', '--repo=/run/osbuild/tree/repo', '--add-metadata-string=version=1', '--add-metadata-string=rpmostree.inputhash=a79640dcc351adb1198eb96a38843103ac243bf2f9d55f8f5d055e681742c8b8', '--write-composejson-to=/run/osbuild/tree/compose.json', '/tmp/tmpb1dfd_rl.json', '/run/osbuild/tree/tmpfsmmc_tr']' returned non-zero exit status 1.

System details

rpm-ostree-2024.4-2

Additional information

No response

It would be nice if we could add:

git clone https://gitlab.com/CentOS/automotive/sample-images.git
sudo sample-images/auto-image-builder.sh cs9-qemu-minimal-ostree.x86_64.qcow2 # run in x86 environment
sudo sample-images/auto-image-builder.sh cs9-ridesx4-minimal-ostree.aarch64.aboot.simg # run in aarch64 environment

to either ostree/rpm-ostree upstream CI. Or as just part of the CentOS Stream 9 release process. We get breakages from time to time because we do some things differently to other CentOS Stream 9 based OSes.

The cs9-ridesx4-minimal-ostree.aarch64.aboot.simg image is of greatest value, that has the most differences, but the x86 one is useful because it's easier to find x86 machines.

The sample-images just needs /dev/kvm right? Sounds automatable via Prow/Jenkins easily enough.

Onto the problem. One thing I do notice is:

warning: boot-location: "new" is deprecated, use boot-location: modules

And yeah...definitely want to flip on boot-location: modules here. But we should still work with the old version.

I'm a bit confused as I don't think there were relevant changes in rpm-ostree here - there definitely were changes on the build side but I don't see anything obvious.

I'm not totally remembering here what the logic in rename_if_exists here is trying to do. I think this is saying we have content in both /boot and /usr/lib/ostree-boot somehow.

Sample images unfortunately by default needs full root. But, if you feed it with a small vm it can do eveything using qemu, and /dev/kvm should be enough then.

Getting a different problem if I change to boot-location: modules

Gonna bite that bullet and switch to boot-location: modules

Another osbuild failure we are seeing:

cs9-qemu-minimal-ostree.aarch64.qcow21710863931.txt

dracut: Could not find 'strip'. Not stripping the initramfs.
dracut: *** Store current command line parameters ***
dracut: *** Creating image file '/tmp/initramfs.img' ***
dracut: *** Creating initramfs image file '/tmp/initramfs.img' done ***
error: Postprocessing and committing: Finalizing rootfs: Hardlinking rpmdb to base location: Hardlinking /usr/share/rpm to /usr/lib/sysimage/rpm-ostree-base-db: Analyzing /usr/share/rpm/ content: File exists (os error 17)
Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.ostree.commit", line 127, in <module>
    r = main(args["inputs"],
  File "/run/osbuild/bin/org.osbuild.ostree.commit", line 114, in main
    subprocess.run(argv,
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['rpm-ostree', 'compose', 'commit', '--repo=/run/osbuild/tree/repo', '--add-metadata-string=version=9', '--add-metadata-string=rpmostree.inputhash=b2819e9426c338ad7e076a8e95593cdc74af378d80d16d9de704f6c15d8a1cfd', '--write-composejson-to=/run/osbuild/tree/compose.json', '/tmp/tmpsd0tmbtw.json', '/run/osbuild/tree/tmp8z9wbl7i']' returned non-zero exit status 1.

⏱   Duration: 16s

Also, this seems to be a new path:

Removing RPM-generated 'usr/lib/ostree-boot/initramfs-5.14.0-428.380.el9iv.aarch64.img-38421f5ef7842bf75b35454bbfd723e61bb6ba759d788938fb51a0386e96bb72'

@Yarboa would you have cycles to do some CI work here?

$ git diff
diff --git a/.cci.jenkinsfile b/.cci.jenkinsfile
index 65507879..ad35b656 100644
--- a/.cci.jenkinsfile
+++ b/.cci.jenkinsfile
@@ -55,6 +55,10 @@ cosaPod(runAsUser: 0, memory: "${mem}Mi", cpu: "${nhosts}") {
        ${env.WORKSPACE}/ci/composepost-checks.sh
     """)
   }
+  stage("Build AutoSD") {
+    shwrap("""
+    """)
+  }
   stage("Install Deps") {
     shwrap("ci/install-test-deps.sh")
   }

we basically want to start Building AutoSD images in here with the rpm-ostree rpm from the given build.

Last known good commit:

3fc7c23

so this seems to be the first PR where it broke:

#4810

It seems like the movement of the:

g_print ("Adding rpm-ostree-0-integration.conf\n");

code triggered this.

⏱  Duration: 0s
org.osbuild.ostree.commit: c2502f9476207d488ad511a74241ab840181dff68a7ca5b6a9b849b9007d12bb {
  "ref": "cs9/x86_64/qemu-minimal",
  "os_version": "9",
  "selinux-label-version": 1
}
"/var/tmp" already exists and is not a directory.
warning: boot-location: "new" is deprecated, use boot-location: modules
New passwd entries: adm, bin, daemon, dbus, ftp, games, guest, halt, lp, mail, nobody, operator, shutdown, sync, systemd-coredump, tss
New group entries: adm, audio, bin, cdrom, daemon, dbus, dialout, disk, floppy, ftp, games, input, kmem, kvm, lock, lp, mail, man, mem, nobody, render, sys, systemd-coredump, systemd-journal, tape, tss, tty, users, utempter, utmp, video, wheel
Committing...done
Metadata Total: 6745
Metadata Written: 1982
Content Total: 11645
Content Written: 9542
Content Cache Hits: 0
Content Bytes Written: 463478661
cs9/x86_64/qemu-minimal => 0c08a0727427782fac2a6160a58d86053bb48bd996cb17280f0c4a1dcd1dec62

in a healthy flow org.osbuild.ostree.commit looks like above. But in an unhealthy flow, the recompiling selinux, dracut flow, etc. is re-executed, even though that was already run by the preptree stage.

@jlebon what do you think is the best fix here? Moving the:

g_print ("Adding rpm-ostree-0-integration.conf\n");

code back I'm pretty sure just fixes this, but I guess it was moved for a reason.

#4879 (comment)

jenkinsfile

@ericcurtin let me see if i understand,
Do you suggest to build AutoSD image in testing farm? and verify build is complete?

@Yarboa yes it would involve:

It's to achieve greater stability in our builds and catch these things earlier.

@Yarboa yes it would involve:

It's to achieve greater stability in our builds and catch these things earlier.

@ericcurtin Packit can build the rpm and test will install it into AutoSD build, later build it.
Test can not run the image generated, is it acceptable?

@Yarboa the rpm-ostree rpm needs to be part of the osbuild of AutoSD before you even boot AutoSD:

sudo sample-images/auto-image-builder.sh

aka installing in a booted system won't be enough.

@Yarboa the rpm-ostree rpm needs to be part of the osbuild of AutoSD before you even boot AutoSD:

sudo sample-images/auto-image-builder.sh

aka installing in a booted system won't be enough.

I got that, for sure, it is part of building
I did not recon auto-image-builder.sh
There are two option here,

  1. Use packit build for rpm-ostree
  2. Use rpmbuild

So in containerized build, rpm local install or packit repo enable for rpm-ostree
Before running sample images make call

Note: for packit:
https://dashboard.packit.dev/results/copr-builds/1426399

So, I took a look at this, and indeed, the problematic commit is 3fc7c23, and it has two main issues:

First of all, it moves the generation of the tmpfiles.d dropin from the post-process phase to the install phase. However, the way osbuild uses rpm-ostree, the install phase is not used. What ostree does is use its support for image creation, which installs the rpms and whatnot. And then it runs rpm-ostree compose postprocess as part of the org.osbuild.preptree stage, and then it runs rpm-ostree compose commit as part of the org.osbuild.ostree.commit stage.

So, when the tmpfile was moved to install, that never gets created when building ostree images using osbuild.

Secondly, when running rpm-ostree compose commit it used to be the case that postprocess_final() noticed that the ostree integration dropin was there, so it could avoid triggering a second postprocess. But this was changed to now look for usr/lib/password. However, this in the osbuild case (at least for automotive) this file isn't created during postprocessing, so the postprocessing is triggered again,

This eventually ends with the failure:

error: Postprocessing and committing: Finalizing rootfs: Hardlinking rpmdb to base location: Hardlinking /usr/share/rpm to /usr/lib/sysimage/rpm-ostree-base-db: Analyzing /usr/share/rpm/ content: File exists (os error 17)

Which I guess is a general problem stemming from trying to post-process twice.

@jlebon Is there some other way to solve #4810

the problematic commit is 3fc7c23,

That can't be true... did you mean eee3bb1 ?

@cgwalters Yes, sorry.

Also this seems to affect regular osbuild users too: https://issues.redhat.com/browse/RHEL-29559

OK I put up a revert at #4881

BTW if I was in charge, one could just "git revert" the landing of the rpm-ostree build into c9s entirely and that would just work. Or really of course, any change into any package. Being able to do that is definitely part of an image-based centric mindset. But we can't do that because rpm...

I do think one cs9 based build and osbuild run prevents this from happening in future. Maybe that's AutoSD or something else.

It means another re-build of the code to make a cs9 rpm and a quick osbuild run... It's probably another 15 minutes added to the build, but I think it's worth it.

I think it's really neat for development that rpm-ostree and libostree stay close to upstream on CentOS Stream and I want that to continue (image-based things should be closer to upstream IMO). But one CI build might be nice.

ostree/rpm-ostree stability hasn't been great for AutoSD in 2024, it's not anyone's fault, it's due to the success of this area, new changes are coming in frequently.