race: parallel builds: copying...committing...creating... layer not known
edsantiago opened this issue · 6 comments
This might be the same as containers/podman#23331 . If it is, someone please close this or move.
Setup:
$ for i in 1 2;do printf "FROM quay.io/libpod/testimage:20240123\nRUN echo hi from $i\n" >Containerfile$i;done
In window 1:
$ while :;do buildah build -t c1 --layers=true -f Containerfile1 || break;buildah rmi c1;done
In window 2:
$ while :;do buildah build --layers=false -t c2 -f Containerfile2|| break;buildah rmi c2;done
Within 30-60s, window 1 will barf:
STEP 1/2: FROM quay.io/libpod/testimage:20240123
STEP 2/2: RUN echo hi from 1
Error: checking if cached image exists from a previous build: getting top layer info: layer not known
or
STEP 1/2: FROM quay.io/libpod/testimage:20240123
STEP 2/2: RUN echo hi from 1
hi from 1
COMMIT c1
Error: committing container for step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[echo hi from 1] Flags:[] Attrs:map[] Message:RUN echo hi from 1 Heredocs:[] Original:RUN echo hi from 1}: copying layers and metadata for container "a8d0253ccd5f337ca69e106657dc645e4926b20d9775621827b6ec118bcb35fa": committing the finished image: creating image "41778f8cf15b69d1fdb79d5bb744ba65eac877e27a21dd12af8700594d88585b": layer not known
The rmi
seems important; I can't get it to fail (at least not within my patience tolerance of ~10m) if I omit rmi
from either loop.
Testing with podman
fails MUCH faster than buildah, for reasons I don't understand, and also fails sometimes in window 2. Buildah only fails in window 1.
This is blocking parallelization of podman test 070-build
and I bet this is one of the uncategorized weirdnesses I've seen in #5552 but didn't follow up on.
Issue persists:
<+0042s> # # podman build -t b-t156-muinxj0h /tmp/CI_dBI1/podman_bats.20lh4r/build-test
<+477ms> # STEP 1/3: FROM quay.io/libpod/testimage:20240123
# STEP 2/3: COPY ./ /tmp/test/
# Error: checking if cached image exists from a previous build: getting top layer info: layer not known
<+005ms> # [ rc=125 (** EXPECTED 0 **) ]
Podman PR containers/podman#23275 with current buildah (v1.37.1-0.20240828183349-69259725a0df) vendored.
This is two builds with --layers=true
, which means they're reading each other's work as cache candidates, which is not something #5686 was concerned with.
A friendly reminder that this issue had no activity for 30 days.