GH Actions (macOS): Run a job for "make build-local" first, cache image for job "make build"
mkoeppe opened this issue · 79 comments
We revise the GH Actions workflows to use a 2-stage build:
In the first stage, run make build-local, and store SAGE_LOCAL as a build artifact. In the second stage, download the build artifact and run more building and testing.
(+) On top of the artifact containing the full SAGE_LOCAL, we can test several ways to build the Python parts
- Sage distribution, classic
- Sage distribution with
configure --enable-editable - Sage distribution with
configure --enable-system-site-packages(#29665) - the configurations from pkgs/sagemath-standard/tox.ini
- build/install of modularized distributions such as pkgs/sage-categories (#29865), pkgs/sagemath-standard-no-symbolics (#32601)
(+) Tests of optional and experimental packages can be streamlined, as we avoid rebuilding their dependencies that are standard packages.
(+) Splitting the job into two would also help with the configurations for which we scrape at the 6 hour time limit
(-) Unfortunately, because because "needs" cannot depend on "matrix", the jobs for building/testing Python packages would not start before all jobs building SAGE_LOCAL for all platforms are completed
In this ticket, we only change all existing macos workflows to a 2-stage workflow, integrating the separate workflows for optional and experimental packages.
As of this ticket, we rely on the bottleneck of the available parallel jobs on GH Actions to ensure that the 2nd stages of a configuration are run after the 1st stage of that configuration. Experience with this workflow will show whether this suffices.
We also update the macOS/Xcode versions according to what's available on GH Actions and switch the homebrew builds to faster homebrew-usrlocal variants, which can use bottles for all available packages.
Depends on #32113
Depends on #32947
CC: @tobiasdiez @kliem @orlitzky @isuruf
Component: porting
Author: Matthias Koeppe
Branch/Commit: edb4364
Reviewer: Dima Pasechnik
Issue created by migration from https://trac.sagemath.org/ticket/32703
Last 10 new commits:
2101b8c | build/pkgs/python3/spkg-build.in: Make sure that python finds sqlite3 when determining which extension modules to build |
3bbc5d8 | build/pkgs/python3/spkg-build.in: Set rpath |
124b605 | Merge #32698 |
bca2141 | pkgs/sagemath-standard/tox.ini: Use SAGE_VENV or venv symlink to find wheels |
bdbab3a | build/make/Makefile.in: Remove SPKG-tox, SPKG-sdist targets |
63e47ff | Merge #31535 |
1c40d1c | .github/workflows/extract-sage-local.sh: Make script usable on non-cygwin |
dee3f41 | github/workflows/tox.yml: Multi-stage local-macos |
bd9fad1 | .github/workflows/: Remove workflows for testing |
d08452c | Update systems |
Branch pushed to git repo; I updated commit sha1. New commits:
0799c5f | .github/workflows/tox.yml: Fixup for macOS tar |
Branch pushed to git repo; I updated commit sha1. New commits:
e031622 | .github/workflows/tox.yml: Fix up upload path |
Branch pushed to git repo; I updated commit sha1. New commits:
a40d706 | .github/workflows/tox.yml: Do not use /tmp for artifacts to avoid https://github.com/actions/upload-artifact/issues/92 |
working well now
Branch pushed to git repo; I updated commit sha1. New commits:
4dcd768 | .github/workflows/tox.yml: mkdir |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
828463d | github/workflows/tox.yml: Multi-stage local-macos |
If I understand the changes (and the output) correctly, then currently stage 1 and 2 run parallel. For example, some stage 2 runs (e.g https://github.com/mkoeppe/sage/runs/3935018795?check_suite_focus=true) fail since they try to download a non-existing artifact. Maybe something like https://github.com/marketplace/actions/wait-for-check helps.
Replying to @tobiasdiez:
If I understand the changes (and the output) correctly, then currently stage 1 and 2 run parallel.
Yes, in theory. But in practice we only get 5 parallel jobs on macOS and we launch many more jobs...
For example, some stage 2 runs (e.g https://github.com/mkoeppe/sage/runs/3935018795?check_suite_focus=true) fail since they try to download a non-existing artifact.
Well, they failed because the previous stage failed.
Replying to @tobiasdiez:
Maybe something like https://github.com/marketplace/actions/wait-for-check helps.
Thanks for the pointer, we can look into using something like this
Testing with integrated builds for the optional packages now at https://github.com/mkoeppe/sage/actions/runs/1360303527
Replying to @mkoeppe:
Replying to @tobiasdiez:
If I understand the changes (and the output) correctly, then currently stage 1 and 2 run parallel.
Yes, in theory. But in practice we only get 5 parallel jobs on macOS and we launch many more jobs...
Okay, but isn't this a very fragile design? Imagine that you have exactly 5 matrix cases. So all of the stage 1 tasks start at the same time. Now if the last case exists slightly before the other 4, then the stage 2 for the other 4 may be invoked and now fail as their stage 1 is not yet finished. Also this would limit everything to mac, as on linux you have way more runners.
Replying to @tobiasdiez:
on linux you have way more runners.
we also have way more jobs...
Then I would suggest using two jobs that wait using "needs" and accept the disadvantages that go with it.
Unfortunately, because because "needs" cannot depend on "matrix", the jobs for building/testing Python packages would not start before all jobs building SAGE_LOCAL for all platforms are completed
I think this is also a more flexible design as the stage two matrix will probably have more options in the future (e.g. editable install, with/without prebuild wheels from pypi, pipenv).
In fact, for maximal flexibility do you think it makes sense too split it further into 3 stages:
- Built all non-python packages (min/standard/max)
- Built all python packages (standard, editable, prebuild wheels, pipenv)
- Run doctests
Branch pushed to git repo; I updated commit sha1. New commits:
313dc82 | .github/workflows/tox.yml [local-macos]: Run optional packages here |
65f3155 | .github/workflows/: Remove workflows for testing |
3aeedb2 | Update systems |
63d0752 | .github/workflows/tox.yml: Pass SAGE_LOCAL to extract-sage-local.sh |
9404238 | .github/workflows/extract-sage-local.sh: Use short option with touch so it works on macOS; handle var/lib/sage/venv* too |
Branch pushed to git repo; I updated commit sha1. New commits:
d09e275 | Fixup |
Yes, in stage 2 we can have all of these variants, next to the runs for the optional packages added in 313dc82
Whether the doctest needs a separate stage remains to be seen; I think having it as part of the stage 2 builds will easily fit into the 6 hour limit in all variants.
Optional packages working well now - https://github.com/mkoeppe/sage/actions/runs/1364372711
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
4ef2baa | build/make/Makefile.in: Add targets SPKG-sdist, SPKG-tox, SPKG-tox-% for script packages |
f621e31 | tox.ini: Add local-conda-environment-src |
32f2755 | pkgs/sagemath-standard/tox.ini: Use SAGE_VENV or venv symlink to find wheels |
ea69f60 | build/make/Makefile.in: Remove SPKG-tox, SPKG-sdist targets |
2a9ab2a | .github/workflows/extract-sage-local.sh: Make script usable on non-cygwin |
e1410de | github/workflows/tox.yml: Multi-stage local-macos |
1180c80 | .github/workflows/tox.yml [local-macos]: Run optional packages here |
b28e85d | .github/workflows/tox.yml: Pass SAGE_LOCAL to extract-sage-local.sh |
9940993 | .github/workflows/extract-sage-local.sh: Use short option with touch so it works on macOS; handle var/lib/sage/venv* too |
Branch pushed to git repo; I updated commit sha1. New commits:
553dea4 | .github/workflows/tox.yml [local-macos]: Run experimental packages here |
Branch pushed to git repo; I updated commit sha1. New commits:
924d4e6 | .github/workflows/tox.yml [local-macos]: Add tox_packages_factor=maximal |
Description changed:
---
+++
@@ -9,6 +9,10 @@
(-) Unfortunately, because [because "needs" cannot depend on "matrix"](https://github.community/t/needs-based-on-matrix/132400), the jobs for building/testing Python packages would not start before all jobs building `SAGE_LOCAL` for all platforms are completed
+In this ticket, we change all existing `macos` workflows to a 2-stage workflow, integrating the separate workflows for optional and experimental packages.
+
+As of this ticket, we rely on the bottleneck of the available parallel jobs on GH Actions to ensure that the 2nd stages of a configuration are run after the 1st stage of that configuration. Experience with this workflow will show whether this suffices.
+
References
- https://evilmartians.com/chronicles/build-images-on-github-actions-with-docker-layer-caching
Author: Matthias Koeppe
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
ef7e478 | .github/workflows/extract-sage-local.sh: Make script usable on non-cygwin |
d943c03 | github/workflows/tox.yml: Multi-stage local-macos |
5e89943 | .github/workflows/tox.yml [local-macos]: Run optional packages here |
d9d9823 | .github/workflows/tox.yml: Pass SAGE_LOCAL to extract-sage-local.sh |
659d2a8 | .github/workflows/extract-sage-local.sh: Use short option with touch so it works on macOS; handle var/lib/sage/venv* too |
f4bd501 | .github/workflows/tox.yml [local-macos]: Run experimental packages here |
7b15b44 | .github/workflows/tox.yml [local-macos]: Add tox_packages_factor=maximal |
Description changed:
---
+++
@@ -1,15 +1,20 @@
-(+) On top of the built image containing the full SAGE_LOCAL, we can test several ways to build the Python parts
+We revise the GH Actions workflows to use a 2-stage build:
+In the first stage, run `make build-local`, and store `SAGE_LOCAL` as a build artifact. In the second stage, download the build artifact and run more building and testing.
+
+(+) On top of the artifact containing the full SAGE_LOCAL, we can test several ways to build the Python parts
- Sage distribution, classic
- Sage distribution with `configure --enable-editable`
- Sage distribution with `configure --enable-system-site-packages` (#29665)
- the configurations from pkgs/**sagemath-standard**/tox.ini
- build/install of modularized distributions such as pkgs/**sage-categories** (#29865), pkgs/**sagemath-standard-no-symbolics** (#32601)
+(+) Tests of optional and experimental packages can be streamlined, as we avoid rebuilding their dependencies that are standard packages.
+
(+) Splitting the job into two would also help with the configurations for which we scrape at the 6 hour time limit
(-) Unfortunately, because [because "needs" cannot depend on "matrix"](https://github.community/t/needs-based-on-matrix/132400), the jobs for building/testing Python packages would not start before all jobs building `SAGE_LOCAL` for all platforms are completed
-In this ticket, we change all existing `macos` workflows to a 2-stage workflow, integrating the separate workflows for optional and experimental packages.
+In this ticket, we only change all existing `macos` workflows to a 2-stage workflow, integrating the separate workflows for optional and experimental packages.
As of this ticket, we rely on the bottleneck of the available parallel jobs on GH Actions to ensure that the 2nd stages of a configuration are run after the 1st stage of that configuration. Experience with this workflow will show whether this suffices.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
497682f | .github/workflows/extract-sage-local.sh: Make script usable on non-cygwin |
afbd4fa | github/workflows/tox.yml: Multi-stage local-macos |
57884a6 | .github/workflows/tox.yml [local-macos]: Run optional packages here |
49276f2 | .github/workflows/tox.yml: Pass SAGE_LOCAL to extract-sage-local.sh |
c3e2467 | .github/workflows/extract-sage-local.sh: Use short option with touch so it works on macOS; handle var/lib/sage/venv* too |
55c83f5 | .github/workflows/tox.yml [local-macos]: Run experimental packages here |
10de67a | .github/workflows/tox.yml [local-macos]: Add tox_packages_factor=maximal |
Branch pushed to git repo; I updated commit sha1. New commits:
8617b6f | .github/workflows/tox.yml: Reduce/streamline macos environments |
Branch pushed to git repo; I updated commit sha1. New commits:
2dee3f0 | .github/workflows/tox.yml (macos): Remove max-parallel |
Looks like homebrew's gcc package no longer installs an unversioned gfortran binary
https://github.com/mkoeppe/sage/runs/4572338375?check_suite_focus=true
Also conda install fails early
https://github.com/mkoeppe/sage/runs/4572338435?check_suite_focus=true
Replying to @mkoeppe:
Also conda install fails early
https://github.com/mkoeppe/sage/runs/4572338435?check_suite_focus=true
(same as #32113 comment:17)
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
43d481b | bump to 0.1.0 |
f66821b | cysignals are needed |
ae37b4e | deprecate sage.interfaces.primecount, not just remove |
d3ae5f4 | primecount is on conda, too |
3a8c7fe | primesieve is on conda, too |
74b3845 | allow float inputs for prime_pi |
591be34 | Merge #32894 |
049a5f6 | tox.ini: Do not set environment variable CONDARC |
f3116a1 | tox.ini (conda): Force use of conda's python3 |
6418579 | Merge #32113 |
Branch pushed to git repo; I updated commit sha1. New commits:
933134c | build/pkgs/gfortran/distros/homebrew.txt: Use gfortran |
Branch pushed to git repo; I updated commit sha1. New commits:
ab4fb99 | tox.ini: Add ubuntu-jammy, debian-bookworm, linuxmint-20.3, fedora-36 |
e51225a | .github/workflows/tox*.yml: Remove debian-jessie, add ubuntu-jammy, debian-bookworm, linuxmint-20.3, fedora-36 |
ac47e7d | .github/workflows/tox-gcc_spkg.yml: Remove |
1aa9328 | Merge #32947 |
e99a09e | sed -i.bak 's/ubunty/ubuntu/g' .github/workflows/*.yml |
I know we already touch upon this point, but can you please expand on
Unfortunately, because because "needs" cannot depend on "matrix", the jobs for building/testing Python packages would not start before all jobs building SAGE_LOCAL for all platforms are completed
That is, why do you prefer the "custom stage build" over having a "local-macos-stage2" job that depends via "needs" on "local-macos-stage1", where each of these jobs defines an appropriate matrix. In particular, I don't understand the "all platforms" part, as ubuntu and macos are in different jobs, right? It appears to me that with the current design also stage 1 builds are executed before any stage 2 is executed, so the only difference is that you can already have 4 or so stage 2 builds running while the last stage 1 build finishes. I guess for the overall build time this shouldn't make much of a difference anyway, as these 4 runners would be busy with other workflows then (or are free to execute builds of other stuff in the sagemath org).
Replying to @tobiasdiez:
That is, why do you prefer the "custom stage build" over having a "local-macos-stage2" job that depends via "needs" on "local-macos-stage1", where each of these jobs defines an appropriate matrix.
To avoid copy-paste
Replying to @tobiasdiez:
I don't understand the "all platforms" part, as ubuntu and macos are in different jobs, right?
Yes, that's right, within each job; and in this ticket, only for macos.
Replying to @mkoeppe:
Replying to @tobiasdiez:
That is, why do you prefer the "custom stage build" over having a "local-macos-stage2" job that depends via "needs" on "local-macos-stage1", where each of these jobs defines an appropriate matrix.
To avoid copy-paste
Github recently made it possible to reuse workflows: https://docs.github.com/en/actions/learn-github-actions/reusing-workflows
Thus you could extract the current local-macos job into a new workflow and pass the right "tox" command as an argument (determined by the matrix + step).
Thanks for the pointer!! I've added this to #29060 for welcome future refactoring - but this won't make it into Sage 9.5
Branch pushed to git repo; I updated commit sha1. New commits:
edb4364 | .github/workflows/tox.yml: Replace homebrew-macos-python3_xcode-standard by homebrew-macos-usrlocal-python3_xcode-standard |
Description changed:
---
+++
@@ -18,8 +18,6 @@
As of this ticket, we rely on the bottleneck of the available parallel jobs on GH Actions to ensure that the 2nd stages of a configuration are run after the 1st stage of that configuration. Experience with this workflow will show whether this suffices.
-References
-- https://evilmartians.com/chronicles/build-images-on-github-actions-with-docker-layer-caching
+We also update the macOS/Xcode versions according to what's available on GH Actions and switch the `homebrew` builds to faster `homebrew-usrlocal` variants, which can use bottles for all available packages.
-Changed reviewer from https://github.com/mkoeppe/sage/actions/runs/1605145773 to Dima Pasechnik
OK,macOS runs on GH look good.
Thanks!