coreos/rpm-ostree

Container image fails to build with 32-bit RPMs

Closed this issue · 15 comments

Host system details

I am on Fedora Workstation 38 trying to build Fedora Silverblue 38.

$ rpm-ostree --version
rpm-ostree:
 Version: '2023.7'
 Git: 06600cc4e215bed584ee71ff34bf5b78a181cab4
 Features:
  - rust
  - compose
  - container
  - fedora-integration

Actual behavior:

$ sudo rpm-ostree compose image --initialize --format=registry --cachedir=${WORKDIR}/cache fedora-silverblue-with-32-bit-packages.yaml docker.io/${DOCKER_HUB_USERNAME}/${DOCKER_HUB_CONTAINER}
...
error: Multiple installed 'NetworkManager-libnm' (NetworkManager-libnm-1:1.42.8-1.fc38.x86_64, NetworkManager-libnm-1:1.42.8-1.fc38.i686)
error: container-encapsulate failed: ExitStatus(unix_wait_status(256))

Expected behavior:

$ sudo rpm-ostree compose image --initialize --format=registry --cachedir=${WORKDIR}/cache fedora-silverblue-with-32-bit-packages.yaml docker.io/${DOCKER_HUB_USERNAME}/${DOCKER_HUB_CONTAINER}
...
fedora/38/x86_64/silverblue => 86a605f3d627d44bb6baef4a06dc6d464e356e14a48a310d364b1636b48c1a0f

Steps to reproduce it

I first encountered this issue trying to install Steam from RPMFusion. I have simplified being able to reproduce this by only installing the 32-bit NetworkManager-libnm package that is a dependency of steam.

git clone --branch f38 https://pagure.io/workstation-ostree-config.git
cd workstation-ostree-config
cat <<EOF > fedora-silverblue-with-32-bit-packages.yaml
---
include: fedora-silverblue.yaml
releasever: "38"
packages:
  - NetworkManager-libnm.i686
EOF

Building to a local tree repository (not a container image) works.

Commands:

export WORKDIR="/root/tmp"
sudo mkdir -p ${WORKDIR}/cache ${WORKDIR}/repo
sudo ostree --repo=${WORKDIR}/repo init --mode=archive-z2
sudo rpm-ostree compose tree --unified-core --cachedir=${WORKDIR}/cache --repo=${WORKDIR}/repo fedora-silverblue-with-32-bit-packages.yaml

Output:

Committing... done
Metadata Total: 19275
Metadata Written: 7315
Content Total: 19192
Content Written: 1761
Content Cache Hits: 86665
Content Bytes Written: 195441152
7315 metadata, 89691 content objects imported; 4.4 GB content written                                                               
fedora/38/x86_64/silverblue => 86a605f3d627d44bb6baef4a06dc6d464e356e14a48a310d364b1636b48c1a0f

Building a container image does NOT work.

Commands:

export WORKDIR="/root/tmp"
sudo mkdir -p ${WORKDIR}/cache ${WORKDIR}/repo
sudo ostree --repo=${WORKDIR}/repo init --mode=archive-z2
sudo rpm-ostree compose image --initialize --format=registry --cachedir=${WORKDIR}/cache fedora-silverblue-with-32-bit-packages.yaml docker.io/${DOCKER_HUB_USERNAME}/${DOCKER_HUB_CONTAINER}

Output:

Committing... done
Metadata Total: 19275
Metadata Written: 1539
Content Total: 19191
Content Written: 34
Content Cache Hits: 86666
Content Bytes Written: 159547174
1539 metadata, 5307 content objects imported; 0 bytes content written                                                               
Wrote commit: 349981df836e72c5c9d72f23a937fecba0a2ad86b10c986fbbd89625b9fd01c5
Reading packages... done
error: Multiple installed 'NetworkManager-libnm' (NetworkManager-libnm-1:1.42.8-1.fc38.x86_64, NetworkManager-libnm-1:1.42.8-1.fc38.i686)
error: container-encapsulate failed: ExitStatus(unix_wait_status(256))

Additional info:

A lot of 32-bit issues with rpm-ostree compose tree were addressed a few years ago. I wonder if the same fixes need to be applied to rpm-ostree compose image.

#3161

Would you like to work on the issue?

I am not familiar enough with the internals of rpm-ostree to be able to work on this myself.

This issue seems to be specifically when a package of the same name but different architectures (x86_64 and i686, in this case) are installed together in a container image.

I am also experiencing this, when trying to encapsulate a CentOS 9 container with both 64-bit and 32-bit glibc packages installed:

error: Multiple installed 'glibc' (glibc-2.34-83.el9.12.x86_64, glibc-2.34-83.el9.12.i686)
error: container-encapsulate failed: ExitStatus(unix_wait_status(256))

This doesn't happen when running rpm-ostree compose tree, only rpm-ostree compose image.

The error seems to be coming from here:

= g_strdup_printf ("Multiple installed '%s' (%s, %s)", name_c.c_str (),

I'd hazard a guess that it's being called from here:

let pkgmeta = q.package_meta(name)?;

Forgot to mention, I'm using rpm-ostree 2024.4 on CentOS Stream 9:

rpm-ostree:
 Version: '2024.4'
 Git: afd7ddfc32c44cac657e9cedf3ad90bacdf14bc3
 Features:
  - rust
  - compose
  - container

Hey @antheas , I see in #4953 you mentioned that you had patched your rpm-ostree build with a workaround for this problem of installing 32-bit applications. Was it the exact solution that @jordemort proposed or something else? Any chance we can get this into a PR to fix rpm-ostree upstream?

I just commented out the check.

As this part of the code does not have the reach to know what is the architecture of each package, it can not be implemented properly. Therefore, it will always fail when there are two packages with the same name and different architecture.

This check is only valid when there is a package in the list twice, which 1) can not happen (?) and 2) will error out anyway because of duplicate files. As such I would remove the check.

Perhaps there should be extra logic to error out properly when there are duplicate files. Since I also faced an issue with the lutris dependencies, I think unixodbc carried by wine-core 32 bit. The error there was unclear, and it took me well over an hour to find out the package responsible. However, since I was only testing I just nixed the 32 bit packages and carried on.

From c3787d4b13aed3a25aa358d98f027ddda6304f3a Mon Sep 17 00:00:00 2001
From: antheas <git@antheas.dev>
Date: Thu, 9 May 2024 21:56:29 +0200
Subject: [PATCH] skip multiple packages check to avoid 32bit packages breaking

---
 src/libpriv/rpmostree-refts.cxx | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/libpriv/rpmostree-refts.cxx b/src/libpriv/rpmostree-refts.cxx
index 46f63621..cf83bae2 100644
--- a/src/libpriv/rpmostree-refts.cxx
+++ b/src/libpriv/rpmostree-refts.cxx
@@ -165,16 +165,17 @@ RpmTs::package_meta (const rust::Str name) const
     {
       // TODO: Somehow we get two `libgcc-8.5.0-10.el8.x86_64` in current RHCOS, I don't
       // understand that.
-      if (retval != nullptr)
-        {
-          auto nevra = header_get_nevra (h);
-          g_autofree char *buf
-              = g_strdup_printf ("Multiple installed '%s' (%s, %s)", name_c.c_str (),
-                                 retval->nevra ().c_str (), nevra.c_str ());
-          throw std::runtime_error (buf);
-        }
+      // if (retval != nullptr)
+      //   {
+      //     auto nevra = header_get_nevra (h);
+      //     g_autofree char *buf
+      //         = g_strdup_printf ("Multiple installed '%s' (%s, %s)", name_c.c_str (),
+      //                            retval->nevra ().c_str (), nevra.c_str ());
+      //     throw std::runtime_error (buf);
+      //   }
 
       retval = std::make_unique<PackageMeta> (h);
+      break;
     }
   if (retval == nullptr)
     g_assert_not_reached ();
-- 
2.45.0

Thanks for the very informative insight @antheas ! Perhaps @cgwalters may also have some ideas here for the long-term solution. Maybe it is as simple as removing this check if it is redundant and not necessary.

Actually I take that back, part of the error is the nevra. So potentially a solution is refactoring the check to cut out the version and keep name, architecture. Just don't know if it's worth fixing instead of removing.

error: Multiple installed 'glibc' (glibc-2.34-83.el9.12.x86_64, glibc-2.34-83.el9.12.i686)

I was planning on doing a patch to throw the error only if the architecture is the same (by doing some ugliness with strrchr to pull it off the end of nevra) but I realized that wouldn't actually be correct, because what if instead of merely 2 duplicate packages, there were 3 or more? All of them would have to be compared against each other, which would significantly complicate the check.

I think it would be OK to drop the check entirely. RPM itself has built-in protections against duplicate packages being installed, so I think that if RPM allowed it to happen, it's reasonable to assume that it's valid. It appears that the layer chunking is done on the basis of nevra, which includes architecture, so I don't think allowing a multilib setup is going to break anything on that end.

I intend to prepare a patch that removes the check, and if it works well, I will submit it as a pull request.

I went through all this and I can say it's a bit of a moot point.

I have a patchset for rpm ostree that fixes the 32 bit bugs so that you can reprocess an image that contains 32 bit packages.

However, creating a commit with rpm ostree will a lot of the time not be possible because there are multiple packages that of the same lib own the same /etc file, which causes rpm-ostree to panic. Normal dnf would just overwrite the file

I decided to just strip rpm-ostree entirely and use ostree-rs-ext directly to repackage an oci image into an ostree commit here:
https://github.com/antheas/bazzite-upd

It works a lot better and saves a lot more space than rpm-ostree itself without any of these bugs. However, I'm dealing with a lot of permissions issues that happen because folder permissions and owners get stripped during the processing of the container.

I managed to get it into a point where it boots and works perfectly. However, when moving it to a GitHub Ubuntu action it broke again (sddm panics) and I'm looking into fixing that.

The git history for that repo contains the patch set for rpm ostree (which doesn't fix it panicking for duplicate etc files during committing treefiles; but does for rechunking)

@antheas Hm, do you have any more detail about what goes wrong with /etc? I ended up modifying things to pass the architecture into package_meta instead of removing the check, otherwise there was the chance that package_meta would return information about the wrong package. I also changed the compose-image.sh test to build an image with both 64-bit and 32-bit glibc in it. As you say, both packages appear to claim some of the same files in /etc, but rpm-ostree does not seem to choke on it with my patch.

#5014

Yes, the library unixodbc when installed as both 32 bit and 64 bit causes an error about /etc/odbc.ini covered in this issue #4653

Probably happens because the hash of the file in both packages is different, otherwise OSTree would check it out once I suppose.

Happens before creating the commit, after the commit container-encapsulate (both as part of the image command and standalone) runs correctly.

@antheas I see. I can confirm that installing both 32 bit and 64 bit versions of unixODBC is still a problem with my patch:

error: Installing packages: Checkout unixODBC-2.3.9-4.el9.i686: Copy checkout of cc700d46f407c6c5ab2d5dde474366a928b7398277e61162e7f8ec06f469f07e to odbc.ini: linkat: File exists

I suppose this must only work when the files that the 32-bit and 64-bit versions of packages have in common are identical? i.e. it works fine with glibc.

Given that, I'm not sure if this issue should actually be closed when my patch is merged, or if the maintainers would prefer to keep it open, or if a new issue specifically about packages that have identically-named but different-in-content files between architectures should be opened.

I'm also not sure how that situation is supposed to work on non-ostree based systems; do you just end up with the version of the file from whatever architecture was installed most recently? It smells like trouble and nondeterminism. If the two architectures expect two different things to be installed in the same place, it seems like it would potentially problematic to have them both installed anywhere, ostree or no ostree.

@antheas Would you happen to be familiar with any other packages that have that sort of issue, aside from unixODBC?

Unfortunately I don't know of any other packages, as wine-core brings this dependency. And if it's removed its gone. There might be other packages if unixOCBD is not included in there

I don't think this will ever cause an error. The difference might just be permissions or an extra comment

As for it causing indeterministic behavior, that's up to rpm-ostree's design

I think /etc files are a special area. Most package managers error out on duplicate files but I've never had it happen on /etc.

So I assume you get the last architecture installed?

Well, I had a couple theories, but ended up shooting them both down.

  1. Theory 1: odbc.ini differs between the x86_64 and i686 versions - INCORRECT, the files do not differ; in fact, they are both 0-byte files. (odbcinst.ini is also identical between the two packages, although not 0 bytes)
  2. Theory 2: odbc.ini is not marked as a config file by the RPM - INCORRECT, it is:
[root@penfold tmp]# for rpm in unixODBC*.rpm ; do rpm -qcp "$rpm" ; done
warning: unixODBC-2.3.9-4.el9.i686.rpm: Header V4 RSA/SHA256 Signature, key ID 350d275d: NOKEY
/etc/odbc.ini
/etc/odbcinst.ini
warning: unixODBC-2.3.9-4.el9.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 350d275d: NOKEY
/etc/odbc.ini
/etc/odbcinst.ini

So, kind of out of ideas about why glibc is fine but unixODBC is not.

One thing that I noticed is that the specfile for glibc calls out all of its %config files individually, where as the specfile for unixODBC uses a glob of /etc/odbc* - a close reading of http://ftp.rpm.org/max-rpm/s1-rpm-inside-files-list-directives.html would suggest that this is not allowed:

There is a restriction to the %config directive, and that restriction is that no more than one filename may follow the %config.

I thought maybe rpm-ostree was not interpreting the glob, but I haven't yet found any part of rpm-ostree's code that treats files differently if they are %config or not, and I'm not sure that there actually is one. I'm also not sure if the glob in the specfile is actually still in play by the time you've got a binary RPM, or if everything is fully resolved to actual paths by then - the rpm binary seems to understand it.