Restarting failed builds on Azure spawns 2 extra runs

Question

Restarting failed builds on Azure spawns 2 extra runs

carterbox opened this issue a year ago · 4 comments

Conda-forge documentation

I could not solve my problem using the conda-forge documentation.

Installed packages

N/A

Environment info

N/A

Issue

In the libmagma-feedstock, builds on the main branch often time-out because the builds take 5.5 +/- 0.5 hours to complete. The build time is quite variable; I don't know why and that's not the purpose of this issue.

The issue is that when I return and click the "Rerun failed jobs" button via the GitHub UI. Two extra runs are spawned. For example, after merging the recent update to version 2.7.2. The post-merge commit had two failed build variants, so I pushed the re-run failed jobs button.

https://dev.azure.com/conda-forge/feedstock-builds/_build?definitionId=18893&_a=summary

However, the result is that runs 20231012.2 and 20231012.3 are spawned (which is not necessary?). These later two runs are re-running all of the build variants, whereas run 20231012.1 is only rerunning the two failed variants (which is what I requested).

Answer 1 · 2023-10-12T22:41:22.000Z

I've noticed this too. you want to wait until all the jobs come to completion, fail or pass.

Answer 2 · 2023-10-12T23:11:27.000Z

I think I did wait until all the jobs were completed? I'll keep that in mind for next time.

Answer 3 · 2023-10-12T23:14:09.000Z

Yes. This was after all jobs had completed. 2 timed out and the rest passed.

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=802513&view=logs&j=4f922444-fdfe-5dcf-b824-02f86439ef14&s=96ac2280-8cb4-5df5-99de-dd2da759617d

Answer 4 · 2023-10-12T23:57:05.000Z

It doesn't just affect libmagma (though I've observed this on another PR there very recently). Here's two more recent examples, the most extreme of which spawned >20(!) jobs.

petsc
pandas

In the past, my understanding was that this was due to people clicking the "re-run" button in the Github UI several times, or before the CI run on Azure had fully completed, but it seems to happen as well under textbook conditions (i.e. wait for AP to finish, then click "re-run" once).