sagemath/sage

GH Actions (macOS): Run a job for "make build-local" first, cache image for job "make build"

mkoeppe opened this issue · 79 comments

We revise the GH Actions workflows to use a 2-stage build:
In the first stage, run make build-local, and store SAGE_LOCAL as a build artifact. In the second stage, download the build artifact and run more building and testing.

(+) On top of the artifact containing the full SAGE_LOCAL, we can test several ways to build the Python parts

  • Sage distribution, classic
  • Sage distribution with configure --enable-editable
  • Sage distribution with configure --enable-system-site-packages (#29665)
  • the configurations from pkgs/sagemath-standard/tox.ini
  • build/install of modularized distributions such as pkgs/sage-categories (#29865), pkgs/sagemath-standard-no-symbolics (#32601)

(+) Tests of optional and experimental packages can be streamlined, as we avoid rebuilding their dependencies that are standard packages.

(+) Splitting the job into two would also help with the configurations for which we scrape at the 6 hour time limit

(-) Unfortunately, because because "needs" cannot depend on "matrix", the jobs for building/testing Python packages would not start before all jobs building SAGE_LOCAL for all platforms are completed

In this ticket, we only change all existing macos workflows to a 2-stage workflow, integrating the separate workflows for optional and experimental packages.

As of this ticket, we rely on the bottleneck of the available parallel jobs on GH Actions to ensure that the 2nd stages of a configuration are run after the 1st stage of that configuration. Experience with this workflow will show whether this suffices.

We also update the macOS/Xcode versions according to what's available on GH Actions and switch the homebrew builds to faster homebrew-usrlocal variants, which can use bottles for all available packages.

Depends on #32113
Depends on #32947

CC: @tobiasdiez @kliem @orlitzky @isuruf

Component: porting

Author: Matthias Koeppe

Branch/Commit: edb4364

Reviewer: Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/32703

Dependencies: #31535

Commit: d08452c

Last 10 new commits:

2101b8cbuild/pkgs/python3/spkg-build.in: Make sure that python finds sqlite3 when determining which extension modules to build
3bbc5d8build/pkgs/python3/spkg-build.in: Set rpath
124b605Merge #32698
bca2141pkgs/sagemath-standard/tox.ini: Use SAGE_VENV or venv symlink to find wheels
bdbab3abuild/make/Makefile.in: Remove SPKG-tox, SPKG-sdist targets
63e47ffMerge #31535
1c40d1c.github/workflows/extract-sage-local.sh: Make script usable on non-cygwin
dee3f41github/workflows/tox.yml: Multi-stage local-macos
bd9fad1.github/workflows/: Remove workflows for testing
d08452cUpdate systems

Branch pushed to git repo; I updated commit sha1. New commits:

0799c5f.github/workflows/tox.yml: Fixup for macOS tar

Changed commit from d08452c to 0799c5f

Branch pushed to git repo; I updated commit sha1. New commits:

e031622.github/workflows/tox.yml: Fix up upload path

Changed commit from 0799c5f to e031622

Branch pushed to git repo; I updated commit sha1. New commits:

a40d706.github/workflows/tox.yml: Do not use /tmp for artifacts to avoid https://github.com/actions/upload-artifact/issues/92

Changed commit from e031622 to a40d706

Branch pushed to git repo; I updated commit sha1. New commits:

dd2bc15.github/workflows/tox.yml: Use env.GITHUB_WORKSPACE
2b46f02.github/workflows/tox.yml: Use sage-local-artifact/ as path

Changed commit from a40d706 to 2b46f02

comment:11

working well now

Changed commit from 2b46f02 to 4dcd768

Branch pushed to git repo; I updated commit sha1. New commits:

4dcd768.github/workflows/tox.yml: mkdir

Changed commit from 4dcd768 to 828463d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

828463dgithub/workflows/tox.yml: Multi-stage local-macos
comment:14

If I understand the changes (and the output) correctly, then currently stage 1 and 2 run parallel. For example, some stage 2 runs (e.g https://github.com/mkoeppe/sage/runs/3935018795?check_suite_focus=true) fail since they try to download a non-existing artifact. Maybe something like https://github.com/marketplace/actions/wait-for-check helps.

comment:15

Replying to @tobiasdiez:

If I understand the changes (and the output) correctly, then currently stage 1 and 2 run parallel.

Yes, in theory. But in practice we only get 5 parallel jobs on macOS and we launch many more jobs...

For example, some stage 2 runs (e.g https://github.com/mkoeppe/sage/runs/3935018795?check_suite_focus=true) fail since they try to download a non-existing artifact.

Well, they failed because the previous stage failed.

comment:16

Replying to @tobiasdiez:

Maybe something like https://github.com/marketplace/actions/wait-for-check helps.

Thanks for the pointer, we can look into using something like this

comment:17

Testing with integrated builds for the optional packages now at https://github.com/mkoeppe/sage/actions/runs/1360303527

comment:18

Replying to @mkoeppe:

Replying to @tobiasdiez:

If I understand the changes (and the output) correctly, then currently stage 1 and 2 run parallel.

Yes, in theory. But in practice we only get 5 parallel jobs on macOS and we launch many more jobs...

Okay, but isn't this a very fragile design? Imagine that you have exactly 5 matrix cases. So all of the stage 1 tasks start at the same time. Now if the last case exists slightly before the other 4, then the stage 2 for the other 4 may be invoked and now fail as their stage 1 is not yet finished. Also this would limit everything to mac, as on linux you have way more runners.

comment:19

Replying to @tobiasdiez:

Okay, but isn't this a very fragile design?

Yes

comment:20

Replying to @tobiasdiez:

on linux you have way more runners.

we also have way more jobs...

comment:21

Then I would suggest using two jobs that wait using "needs" and accept the disadvantages that go with it.

Unfortunately, because ​because "needs" cannot depend on "matrix", the jobs for building/testing Python packages would not start before all jobs building SAGE_LOCAL for all platforms are completed

I think this is also a more flexible design as the stage two matrix will probably have more options in the future (e.g. editable install, with/without prebuild wheels from pypi, pipenv).
In fact, for maximal flexibility do you think it makes sense too split it further into 3 stages:

  • Built all non-python packages (min/standard/max)
  • Built all python packages (standard, editable, prebuild wheels, pipenv)
  • Run doctests

Changed commit from 828463d to 9404238

Branch pushed to git repo; I updated commit sha1. New commits:

313dc82.github/workflows/tox.yml [local-macos]: Run optional packages here
65f3155.github/workflows/: Remove workflows for testing
3aeedb2Update systems
63d0752.github/workflows/tox.yml: Pass SAGE_LOCAL to extract-sage-local.sh
9404238.github/workflows/extract-sage-local.sh: Use short option with touch so it works on macOS; handle var/lib/sage/venv* too

Branch pushed to git repo; I updated commit sha1. New commits:

d09e275Fixup

Changed commit from 9404238 to d09e275

comment:24

Yes, in stage 2 we can have all of these variants, next to the runs for the optional packages added in 313dc82

Whether the doctest needs a separate stage remains to be seen; I think having it as part of the stage 2 builds will easily fit into the 6 hour limit in all variants.

comment:25

Optional packages working well now - https://github.com/mkoeppe/sage/actions/runs/1364372711

Changed commit from d09e275 to 9940993

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

4ef2baabuild/make/Makefile.in: Add targets SPKG-sdist, SPKG-tox, SPKG-tox-% for script packages
f621e31tox.ini: Add local-conda-environment-src
32f2755pkgs/sagemath-standard/tox.ini: Use SAGE_VENV or venv symlink to find wheels
ea69f60build/make/Makefile.in: Remove SPKG-tox, SPKG-sdist targets
2a9ab2a.github/workflows/extract-sage-local.sh: Make script usable on non-cygwin
e1410degithub/workflows/tox.yml: Multi-stage local-macos
1180c80.github/workflows/tox.yml [local-macos]: Run optional packages here
b28e85d.github/workflows/tox.yml: Pass SAGE_LOCAL to extract-sage-local.sh
9940993.github/workflows/extract-sage-local.sh: Use short option with touch so it works on macOS; handle var/lib/sage/venv* too

Changed commit from 9940993 to 553dea4

Branch pushed to git repo; I updated commit sha1. New commits:

553dea4.github/workflows/tox.yml [local-macos]: Run experimental packages here

Branch pushed to git repo; I updated commit sha1. New commits:

924d4e6.github/workflows/tox.yml [local-macos]: Add tox_packages_factor=maximal

Changed commit from 553dea4 to 924d4e6

Description changed:

--- 
+++ 
@@ -9,6 +9,10 @@
 
 (-) Unfortunately, because [because "needs" cannot depend on "matrix"](https://github.community/t/needs-based-on-matrix/132400), the jobs for building/testing Python packages would not start before all jobs building `SAGE_LOCAL` for all platforms are completed
 
+In this ticket, we change all existing `macos` workflows to a 2-stage workflow, integrating the separate workflows for optional and experimental packages.
+
+As of this ticket, we rely on the bottleneck of the available parallel jobs on GH Actions to ensure that the 2nd stages of a configuration are run after the 1st stage of that configuration. Experience with this workflow will show whether this suffices.
+
 References
 - https://evilmartians.com/chronicles/build-images-on-github-actions-with-docker-layer-caching
 

Author: Matthias Koeppe

Changed dependencies from #31535 to none

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

ef7e478.github/workflows/extract-sage-local.sh: Make script usable on non-cygwin
d943c03github/workflows/tox.yml: Multi-stage local-macos
5e89943.github/workflows/tox.yml [local-macos]: Run optional packages here
d9d9823.github/workflows/tox.yml: Pass SAGE_LOCAL to extract-sage-local.sh
659d2a8.github/workflows/extract-sage-local.sh: Use short option with touch so it works on macOS; handle var/lib/sage/venv* too
f4bd501.github/workflows/tox.yml [local-macos]: Run experimental packages here
7b15b44.github/workflows/tox.yml [local-macos]: Add tox_packages_factor=maximal

Changed commit from 924d4e6 to 7b15b44

comment:32

Rebased away from #31535.

Description changed:

--- 
+++ 
@@ -1,15 +1,20 @@
-(+) On top of the built image containing the full SAGE_LOCAL, we can test several ways to build the Python parts
+We revise the GH Actions workflows to use a 2-stage build:
+In the first stage, run `make build-local`, and store `SAGE_LOCAL` as a build artifact. In the second stage, download the build artifact and run more building and testing.
+
+(+) On top of the artifact containing the full SAGE_LOCAL, we can test several ways to build the Python parts
 - Sage distribution, classic
 - Sage distribution with `configure --enable-editable`
 - Sage distribution with `configure --enable-system-site-packages` (#29665)
 - the configurations from pkgs/**sagemath-standard**/tox.ini
 - build/install of modularized distributions such as pkgs/**sage-categories** (#29865), pkgs/**sagemath-standard-no-symbolics** (#32601)
 
+(+) Tests of optional and experimental packages can be streamlined, as we avoid rebuilding their dependencies that are standard packages.
+
 (+) Splitting the job into two would also help with the configurations for which we scrape at the 6 hour time limit
 
 (-) Unfortunately, because [because "needs" cannot depend on "matrix"](https://github.community/t/needs-based-on-matrix/132400), the jobs for building/testing Python packages would not start before all jobs building `SAGE_LOCAL` for all platforms are completed
 
-In this ticket, we change all existing `macos` workflows to a 2-stage workflow, integrating the separate workflows for optional and experimental packages.
+In this ticket, we only change all existing `macos` workflows to a 2-stage workflow, integrating the separate workflows for optional and experimental packages.
 
 As of this ticket, we rely on the bottleneck of the available parallel jobs on GH Actions to ensure that the 2nd stages of a configuration are run after the 1st stage of that configuration. Experience with this workflow will show whether this suffices.
 

Changed commit from 7b15b44 to 10de67a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

497682f.github/workflows/extract-sage-local.sh: Make script usable on non-cygwin
afbd4fagithub/workflows/tox.yml: Multi-stage local-macos
57884a6.github/workflows/tox.yml [local-macos]: Run optional packages here
49276f2.github/workflows/tox.yml: Pass SAGE_LOCAL to extract-sage-local.sh
c3e2467.github/workflows/extract-sage-local.sh: Use short option with touch so it works on macOS; handle var/lib/sage/venv* too
55c83f5.github/workflows/tox.yml [local-macos]: Run experimental packages here
10de67a.github/workflows/tox.yml [local-macos]: Add tox_packages_factor=maximal

Changed commit from 10de67a to 8617b6f

Branch pushed to git repo; I updated commit sha1. New commits:

8617b6f.github/workflows/tox.yml: Reduce/streamline macos environments

Branch pushed to git repo; I updated commit sha1. New commits:

2dee3f0.github/workflows/tox.yml (macos): Remove max-parallel

Changed commit from 8617b6f to 2dee3f0

comment:38

Looks like homebrew's gcc package no longer installs an unversioned gfortran binary

https://github.com/mkoeppe/sage/runs/4572338375?check_suite_focus=true

comment:40

Replying to @mkoeppe:

Also conda install fails early

https://github.com/mkoeppe/sage/runs/4572338435?check_suite_focus=true

(same as #32113 comment:17)

Dependencies: #32113

Changed commit from 2dee3f0 to 6418579

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

43d481bbump to 0.1.0
f66821bcysignals are needed
ae37b4edeprecate sage.interfaces.primecount, not just remove
d3ae5f4primecount is on conda, too
3a8c7feprimesieve is on conda, too
74b3845allow float inputs for prime_pi
591be34Merge #32894
049a5f6tox.ini: Do not set environment variable CONDARC
f3116a1tox.ini (conda): Force use of conda's python3
6418579Merge #32113

Changed commit from 6418579 to 933134c

Branch pushed to git repo; I updated commit sha1. New commits:

933134cbuild/pkgs/gfortran/distros/homebrew.txt: Use gfortran

Changed dependencies from #32113 to #32113, #32947

Branch pushed to git repo; I updated commit sha1. New commits:

ab4fb99tox.ini: Add ubuntu-jammy, debian-bookworm, linuxmint-20.3, fedora-36
e51225a.github/workflows/tox*.yml: Remove debian-jessie, add ubuntu-jammy, debian-bookworm, linuxmint-20.3, fedora-36
ac47e7d.github/workflows/tox-gcc_spkg.yml: Remove
1aa9328Merge #32947
e99a09esed -i.bak 's/ubunty/ubuntu/g' .github/workflows/*.yml

Changed commit from 933134c to e99a09e

comment:48

I know we already touch upon this point, but can you please expand on

Unfortunately, because ​because "needs" cannot depend on "matrix", the jobs for building/testing Python packages would not start before all jobs building SAGE_LOCAL for all platforms are completed

That is, why do you prefer the "custom stage build" over having a "local-macos-stage2" job that depends via "needs" on "local-macos-stage1", where each of these jobs defines an appropriate matrix. In particular, I don't understand the "all platforms" part, as ubuntu and macos are in different jobs, right? It appears to me that with the current design also stage 1 builds are executed before any stage 2 is executed, so the only difference is that you can already have 4 or so stage 2 builds running while the last stage 1 build finishes. I guess for the overall build time this shouldn't make much of a difference anyway, as these 4 runners would be busy with other workflows then (or are free to execute builds of other stuff in the sagemath org).

comment:49

Replying to @tobiasdiez:

That is, why do you prefer the "custom stage build" over having a "local-macos-stage2" job that depends via "needs" on "local-macos-stage1", where each of these jobs defines an appropriate matrix.

To avoid copy-paste

comment:50

Replying to @tobiasdiez:

I don't understand the "all platforms" part, as ubuntu and macos are in different jobs, right?

Yes, that's right, within each job; and in this ticket, only for macos.

comment:51

Replying to @mkoeppe:

Replying to @tobiasdiez:

That is, why do you prefer the "custom stage build" over having a "local-macos-stage2" job that depends via "needs" on "local-macos-stage1", where each of these jobs defines an appropriate matrix.

To avoid copy-paste

Github recently made it possible to reuse workflows: https://docs.github.com/en/actions/learn-github-actions/reusing-workflows
Thus you could extract the current local-macos job into a new workflow and pass the right "tox" command as an argument (determined by the matrix + step).

comment:52

Thanks for the pointer!! I've added this to #29060 for welcome future refactoring - but this won't make it into Sage 9.5

Changed commit from e99a09e to edb4364

Branch pushed to git repo; I updated commit sha1. New commits:

edb4364.github/workflows/tox.yml: Replace homebrew-macos-python3_xcode-standard by homebrew-macos-usrlocal-python3_xcode-standard

Description changed:

--- 
+++ 
@@ -18,8 +18,6 @@
 
 As of this ticket, we rely on the bottleneck of the available parallel jobs on GH Actions to ensure that the 2nd stages of a configuration are run after the 1st stage of that configuration. Experience with this workflow will show whether this suffices.
 
-References
-- https://evilmartians.com/chronicles/build-images-on-github-actions-with-docker-layer-caching
+We also update the macOS/Xcode versions according to what's available on GH Actions and switch the `homebrew` builds to faster `homebrew-usrlocal` variants, which can use bottles for all available packages.
 
 
-

Changed reviewer from https://github.com/mkoeppe/sage/actions/runs/1605145773 to Dima Pasechnik

comment:55

OK,macOS runs on GH look good.

comment:56

Thanks!