containers/buildah

race: parallel builds: copying...committing...creating... layer not known

edsantiago opened this issue · 6 comments

This might be the same as containers/podman#23331 . If it is, someone please close this or move.

Setup:

$ for i in 1 2;do printf "FROM quay.io/libpod/testimage:20240123\nRUN echo hi from $i\n" >Containerfile$i;done

In window 1:

$ while :;do buildah build -t c1 --layers=true -f Containerfile1 || break;buildah rmi c1;done

In window 2:

$ while :;do buildah build --layers=false -t c2 -f Containerfile2|| break;buildah rmi c2;done

Within 30-60s, window 1 will barf:

STEP 1/2: FROM quay.io/libpod/testimage:20240123
STEP 2/2: RUN echo hi from 1
Error: checking if cached image exists from a previous build: getting top layer info: layer not known

or

STEP 1/2: FROM quay.io/libpod/testimage:20240123
STEP 2/2: RUN echo hi from 1
hi from 1
COMMIT c1
Error: committing container for step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[echo hi from 1] Flags:[] Attrs:map[] Message:RUN echo hi from 1 Heredocs:[] Original:RUN echo hi from 1}: copying layers and metadata for container "a8d0253ccd5f337ca69e106657dc645e4926b20d9775621827b6ec118bcb35fa": committing the finished image: creating image "41778f8cf15b69d1fdb79d5bb744ba65eac877e27a21dd12af8700594d88585b": layer not known

The rmi seems important; I can't get it to fail (at least not within my patience tolerance of ~10m) if I omit rmi from either loop.

Testing with podman fails MUCH faster than buildah, for reasons I don't understand, and also fails sometimes in window 2. Buildah only fails in window 1.

This is blocking parallelization of podman test 070-build and I bet this is one of the uncategorized weirdnesses I've seen in #5552 but didn't follow up on.

Issue persists:

<+0042s> # # podman build -t b-t156-muinxj0h /tmp/CI_dBI1/podman_bats.20lh4r/build-test
<+477ms> # STEP 1/3: FROM quay.io/libpod/testimage:20240123
         # STEP 2/3: COPY ./ /tmp/test/
         # Error: checking if cached image exists from a previous build: getting top layer info: layer not known
<+005ms> # [ rc=125 (** EXPECTED 0 **) ]

Podman PR containers/podman#23275 with current buildah (v1.37.1-0.20240828183349-69259725a0df) vendored.

This is two builds with --layers=true, which means they're reading each other's work as cache candidates, which is not something #5686 was concerned with.

A friendly reminder that this issue had no activity for 30 days.