git-for-windows/git-sdk-64

Investigate new `ci-artifacts` flakiness

dscho opened this issue · 8 comments

As of https://github.com/git-for-windows/git-sdk-64/actions/runs/7895871817/job/21548925651, it seems that there is a flaky problem. The symptom looks like this:

    [...]
    LINK scalar.exe
strip  headless-git.exe git-daemon.exe git-http-backend.exe git-imap-send.exe git-sh-i18n--envsubst.exe git-shell.exe git-http-fetch.exe git-http-push.exe git-remote-http.exe git-remote-https.exe git-remote-ftp.exe git-remote-ftps.exe git.exe
D:\a\git-sdk-64\git-sdk-64\minimal-sdk\mingw64\bin\strip.exe: unable to copy file 'git.exe'; reason: Permission denied
make: *** [Makefile:2376: strip] Error 1
make: *** Waiting for unfinished jobs....
make: Leaving directory '/d/a/git-sdk-64/git'
Error: Process completed with exit code 2.

This problem usually goes away after re-running a couple of times (once I had to re-run 3 times to make it succeed).

The lucky thing is that the strip Makefile rule is apparently not used in git/git's own CI, therefore things don't fail there (which would be disastrous). So we do not need to drop everything and fix this Right Now, but it needs to be fixed.

Now, the commit corresponding to the first build that exhibited the problem is 863c871. Contrary to what I first thought, that commit did not update the MSYS2 runtime. That update came in the next commit.

Comparing the first failing job with the corresponding job of the previous build, I see in the Set up job step that the runner version changed, from v2.312.0 to v2.313.0. But I don't see any obvious culprit in that version's release notes.

Also in the Set up job step, I see a difference in the runner image (but not in the Windows version), but the corresponding diff also does not shed any light into the issue.

It is possible, of course, that the previous build succeeded on first attempt due to flakiness rather than by virtue of being non-flaky. More investigation is needed here.

The latest three runs worked without any need for re-runs. May have been an overzealous Defender... I'll give it another week, and if there are no other instances of this flake, I'll close this ticket.

The problem is back.

The error happened today, too. Here are the latest ci-artifacts runs (starting with the first one where I did not try to re-run to turn the build green):

Image

After 7 consecutive successful runs, it happened again.

After 8 consecutive successful runs, it happened again.

After only one successful run, there was another failing one.