chainguard-dev/melange

Ownership not preserved in resulting tarball

ppmathis opened this issue · 3 comments

This is probably somewhat related to the fixed and closed #501, but in my own attempts to use melange + apko together it was impossible for me to build a package with melange where certain paths have custom ownership. Based on my current understanding, the issue should be within melange and apko is not to blame, but just to be sure, I included a full example as a reproducer.

My first attempt was running the latest Podman Desktop release on macOS 14.5 (MacBook Pro M1), but there I was not even able to use chown within the melange pipeline. While the commands ran successfully, the changes have not even been persisted within the same runs step, meaning that chown followed by ls (as done in [1] and [2] in the reproducer) was never showing any changes. I think the file system used during builds was somehow not able to work together with Podman, so I put the blame on the differences to Docker, especially since I do not currently have a deep understanding of the Melange build process.

My second attempt, after removing Podman completely from the system, was to instead use Docker Desktop. Here the initial attempt looked way more promising, as running ls after chown or chmod now properly shows the changes, both for [1] and [2] in the melange reproducer config.

Unfortunately, the changed ownership is not being reflected in the final output of the melange tar archive. I read through the previously linked issue and saw the adjusted tarball emitter for the data archive, so I would have assumed that tar --list --numeric-owner -tf packages/aarch64/ownership-1.0.0-r0.apk will show the proper UIDs/GIDs, except for the build user with 1000:1000, which should be remapped to UID/GID 0:0, but I get this output instead:

$ tar --list --numeric-owner -tf packages/aarch64/ownership-1.0.0-r0.apk
-rwxrwxrwx  0 0      0         512 Jan  1  1970 .SIGN.RSA.melange.rsa.pub
-rw-r--r--  0 0      0         211 Jan  1  1970 .PKGINFO
drwxr-xr-x  0 501    20          0 Jan  1  1970 ownership
drwx------  0 501    20          0 Jan  1  1970 ownership/dir
-rwx------  0 501    20          0 Jan  1  1970 ownership/file
drwxr-xr-x  0 501    20          0 Jan  1  1970 var
drwxr-xr-x  0 501    20          0 Jan  1  1970 var/lib
drwxr-xr-x  0 501    20          0 Jan  1  1970 var/lib/db
drwxr-xr-x  0 501    20          0 Jan  1  1970 var/lib/db/sbom
-rw-r--r--  0 501    20       1135 Jan  1  1970 var/lib/db/sbom/ownership-1.0.0-r0.spdx.json

The UID/GID combination of 501:20 matches the UID/GID on my MacOS host system, where this happens to be the primary and active user. If I install this package using apko for building a container image from it, all ownership information seems lost:

$ docker run -it --rm ownership:latest-arm64 ls -lan /ownership
total 12
drwxr-xr-x    3 0        0             4096 Jan  1  1970 .
drwxr-xr-x    1 0        0             4096 Jul 21 20:48 ..
drwx------    2 0        0             4096 Jan  1  1970 dir
-rwx------    1 0        0                0 Jan  1  1970 file

I also double-checked the tar archive from apko with dive, but had the same findings - there is simply no ownership information present on these files anymore, but the permissions were kept.

Based on my understanding, I would consider this a bug and it does not match my own expected behaviour. While I can use paths in apko, it seems like a bad practice to do so, as these paths are concerned to the package itself, and when e.g. combining multiple packages as a service-bundle in apko, I do not want to repeat package-specific path configs across multiple files.

Version Info

melange

GitVersion:    0.11.2
GitCommit:     brew
GitTreeState:  clean
BuildDate:     2024-07-19T01:42:34Z
GoVersion:     go1.22.5
Compiler:      gc
Platform:      darwin/arm64

apko

GitVersion:    0.14.7
GitCommit:     brew
GitTreeState:  clean
BuildDate:     2024-05-31T16:54:55Z
GoVersion:     go1.22.3
Compiler:      gc
Platform:      darwin/arm64

Docker

Client:
 Version:           27.0.3
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        7d4bcd8
 Built:             Fri Jun 28 23:59:41 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.32.0 (157355)
 Engine:
  Version:          27.0.3
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       662f78c
  Built:            Sat Jun 29 00:02:44 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.18
  GitCommit:        ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc:
  Version:          1.7.18
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Reproducer

Steps

  1. melange keygen
  2. melange build --signing-key melange.rsa
  3. apko build -k melange.rsa.pub apko.yaml ownership ownership.tar
  4. docker load < ownership.tar
  5. docker run -it --rm ownership:latest-arm64 ls -lan /ownership
  6. Optionally inspect with further tools, e.g. dive --source docker-archive ownership.tar

While the permission changes with chmod are respected, the ownership changes are lost, and both /ownership/file as well as /ownership/dir show up as being owned by root.

melange.yaml

package:
  name: ownership
  version: 1.0.0
  epoch: 0
  target-architecture:
    - aarch64

environment:
  accounts:
    users:
      - username: nonroot
        uid: 65532
        gid: 65532
    groups:
      - groupname: nonroot
        gid: 65532
  contents:
    repositories:
      - https://packages.wolfi.dev/os
    keyring:
      - https://packages.wolfi.dev/os/wolfi-signing.rsa.pub
    packages:
      - wolfi-baselayout
      - busybox

pipeline:
  - runs: |
      readonly TARGET="${{targets.destdir}}"
      mkdir -p "${TARGET}/ownership/dir"
      touch "${TARGET}/ownership/file"
      chmod 700 "${TARGET}/ownership/dir" "${TARGET}/ownership/file"
      chown nonroot:nonroot "${TARGET}/ownership/file" "${TARGET}/ownership/dir"

      # [1] This ls command properly shows the changed ownership and permissions
      ls -lan "${TARGET}/ownership"
  - runs: |
      # [2] This ls command properly shows the changed ownership and permissions
      ls -lan "${{targets.destdir}}/ownership"

apko.yaml

archs:
  - aarch64

contents:
  repositories:
    - https://packages.wolfi.dev/os
    - '@local packages'
  keyring:
    - https://packages.wolfi.dev/os/wolfi-signing.rsa.pub
  packages:
    - wolfi-baselayout
    - busybox
    - ownership@local

accounts:
  run-as: nonroot
  users:
    - username: nonroot
      uid: 65532
      gid: 65532
  groups:
    - groupname: nonroot
      gid: 65532

So, I've been curious about the root cause and spent some time analysing this myself, and came to the conclusion that the root cause is more on the apko side, but there are multiple reasons why this is failing as of today. I first started to analyze the situation when melange builds a tarball:

  • While running melange build, a temporary directory is mounted to /home/build, and this is where the first issue happens. To avoid permission issues on the host systems, Docker Desktop is not preserving the actual UID/GID, and instead lets that default to the user account on the host, but uses an xattr named com.docker.grpcfuse.ownership which will contain a JSON value like {"UID":123,"GID":123,"mode":770}, which stores the real ownership and permissions.
  • This xattr is being completely ignored though once melange builds the tarball, as writeTar only checks the actual file system permissions, therefor leading to my host account 501:20 leaking into the tarball itself.
  • The memFS implementation used by apko ignores the UID/GID values from tar headers completely and calling memFS.Chown() would be necessary to preserve permissions

I was able to get everything working correctly on apko's side by:

  • Implementing a new tarball.Option which enables the processing of xattr-based permissions for gRPC FUSE
  • Extending Context.writeTar to check for this option and, if enabled, read the xattr for every entry being processed and set the UID/GID/Mode solely based on this xattr. If this xattr is missing or parsing fails, the UID/GID get reset to 0, so the host user does not leak through.
  • Extending memFS.WriteHeader to call memFS.Chown() for both directories and files if either UID or GID is != 0 in the header of the entry being processed

You can find both of these changes in my forked apko branch fs-ownership with these two commits:

Once apko has been extended with the above two commits, melange itself can be adjusted to use the gRPC FUSE ownership option based on its environment, as seen on the fs-ownership branch in my melange fork:

I did not implement any heuristics / detection for this yet, as it's simply meant as a PoC, but I suppose it could be based on the runner environment - e.g. if Docker is used on macOS, then use gRPC FUSE based permissions. With these changes to both apko and melange I'm finally able to build both a clean APK tarball (preserving custom permissions, while not leaking host account UID/GID through) and a clean OCI image with proper permissions as well.

Now my question: Would changes like these be acceptable to the melange/apko team? Or has this not been implemented on purpose so far for some reason unknown to me?

I've also tested this out on a Linux system by now, to have the full picture in terms of ownership support. Here is a summarised breakdown of everything:

  • melange @ macOS: ❌ The resulting TAR archive contains invalid UID/GID and instead leaks the host account through, as the actual UID/GID is stored in an xattr named com.docker.grpcfuse.ownership. Would require heuristics to determine an environment like this, and then forcibly use only the xattr for UID/GID.
  • melange @ Linux: ✅ The resulting TAR archive contains valid ownership out of the box, thanks to #501 / #781 from @amouat / @epsilon-phase
  • apko: ❌ Never respects the UID/GID present in the TAR archive, and only copies all the files as 0:0 into the resulting container image, essentially making any efforts on melange's side for proper ownership useless.

The question that remains on my side is why apko does not preserve ownership (implementation is trivial, as shown in my previous comment with ppmathis/apko@dd8c452) and instead forces 0:0. The melange bug on macOS makes sense, it's a special case, but why would apko ignore ownership?

The question that remains on my side is why apko does not preserve ownership

If I had to guess, it's not deliberate, just an oversight. Since apko has the ability to set ownership explicitly, I suspect we've been doing this at the image level and not at the apk level.

Feel free to PR that change to apko. It would be good to have a test that checks this and ensures it's consistent with /lib/apk/db/installed as well.