Build still non-portable despite SAGE_FAT_BINARY=yes because of numpy
mkoeppe opened this issue · 47 comments
Follow-up from #29537, #31521.
Observed on cygwin-standard but likely also affects the Docker images and the Sage binary distribution.
With the upgrade to numpy 1.20.x (#31008), the non-portability shows as an error message instead of a crash:
[sagelib-9.4.beta0] from numpy.core._multiarray_umath import (
[sagelib-9.4.beta0] RuntimeError: NumPy was built with baseline optimizations:
[sagelib-9.4.beta0] (SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX) but your machine doesn't support:
[sagelib-9.4.beta0] (AVX512F).
[sagelib-9.4.beta0] ************************************************************************
[sagelib-9.4.beta0] Error building the Sage library
[sagelib-9.4.beta0] ************************************************************************
[sagelib-9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib-9.4.beta0.log
Report withs the 9.3 Linux binary:
- https://groups.google.com/g/sage-devel/c/Lj-wx4xm0N4/m/9IiDtu4_CgAJ
- https://groups.google.com/g/sage-support/c/KZFZBoI6xJk/m/aBY9ZIxoBwAJ
Depends on #32257
CC: @embray @kliem @dimpase @vbraun @kiwifb @slel
Component: build
Keywords: sdl
Author: Jonathan Kliem
Branch/Commit: 49e531d
Reviewer: Thierry Monteil
Issue created by migration from https://trac.sagemath.org/ticket/31565
What are the symptoms? I don't really do cygwin but I may do docker one day.
The symptom is that something you build on one machine does not run on another machine, aborting with SIGILL.
Description changed:
---
+++
@@ -1,3 +1,4 @@
Follow-up from #29537, #31521.
-Observed on `cygwin-standard` but likely also affects the Docker images.
+Observed on `cygwin-standard` but likely also affects the Docker images and the Sage binary distribution.
+Moving to 9.4, as 9.3 has been released.
Description changed:
---
+++
@@ -2,3 +2,17 @@
Observed on `cygwin-standard` but likely also affects the Docker images and the Sage binary distribution.
+With the upgrade to numpy 1.20.x (#31008), the non-portability shows as an error message instead of a crash:
+
+```
+ [sagelib-9.4.beta0] from numpy.core._multiarray_umath import (
+ [sagelib-9.4.beta0] RuntimeError: NumPy was built with baseline optimizations:
+ [sagelib-9.4.beta0] (SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX) but your machine doesn't support:
+ [sagelib-9.4.beta0] (AVX512F).
+ [sagelib-9.4.beta0] ************************************************************************
+ [sagelib-9.4.beta0] Error building the Sage library
+ [sagelib-9.4.beta0] ************************************************************************
+ [sagelib-9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib-9.4.beta0.log
+```
+
+Ah, that's right!
Description changed:
---
+++
@@ -15,4 +15,7 @@
[sagelib-9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib-9.4.beta0.log
```
+Report with the 9.3 Linux binary: https://groups.google.com/g/sage-devel/c/Lj-wx4xm0N4/m/9IiDtu4_CgAJ
+
+Description changed:
---
+++
@@ -15,7 +15,9 @@
[sagelib-9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib-9.4.beta0.log
```
-Report with the 9.3 Linux binary: https://groups.google.com/g/sage-devel/c/Lj-wx4xm0N4/m/9IiDtu4_CgAJ
+Report withs the 9.3 Linux binary:
+- https://groups.google.com/g/sage-devel/c/Lj-wx4xm0N4/m/9IiDtu4_CgAJ
+- https://groups.google.com/g/sage-support/c/KZFZBoI6xJk/m/aBY9ZIxoBwAJ
New commits:
b249c47 | Revert "Revert "do not allow numpy intrinsics when building fat binary"" |
Author: Jonathan Kliem
Thanks for taking care of this.
The cygwin-standard build looked rather promising (no "baseline optimizations" message, no crash when just importing numpy) but I am getting crashes again https://github.com/mkoeppe/sage/runs/2867645468 when the doctests do any plotting.
Changed reviewer from https://github.com/mkoeppe/sage/actions/runs/952966309 to Matthias Koeppe
The other runs at https://github.com/mkoeppe/sage/runs/2865933225?check_suite_focus=true look clean.
So I consider this ticket already an improvement. We'll have to chase the crash on cygwin when switching CPUs between build stages in ... yet another ... follow-up ticket.
More numpy-related trouble, looking as e.g. from https://groups.google.com/d/msgid/sage-devel/763f8650-9803-4bba-a0ec-46744204fc22n%40googlegroups.com. (there are more reports like this)
[sagelib-9.4.beta2] File "/home/chapoton/sage/local/lib/python3.8/site-packages/numpy/core/overrides.py", line 7, in <module>
[sagelib-9.4.beta2] from numpy.core._multiarray_umath import (
[sagelib-9.4.beta2] RuntimeError: NumPy was built with baseline optimizations:
[sagelib-9.4.beta2] (SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42) but your machine doesn't support:
[sagelib-9.4.beta2] (POPCNT).
So somehow CPU changes its state, and refuses to say yes to features tested as yes during the Numpy build?!
Is it due to some CPU flags manipulations happening during sagelib build?
I've opened #32021 to deal with that RuntimeError: NumPy was built with baseline optimizations: thing.
Acutally the definition of cpu-dispatch in NUMPY_FCONFIG isn't clean. However, only the vallue will be passed and not the variable NUMPY_FCONFIG.
Actually the place to do it is correct however:
The command arguments are available in build, build_clib, and build_ext. if build_clib or build_ext are not specified by the user, the arguments of build will be used instead, which also holds the default values.
So this does not work for bdist_wheel and this is what we use.
You can do setup.py bdist_wheel build [...build-options...], see for example build/pkgs/jupyter_jsmol/spkg-install.in
Thanks. This seems to do the trick.
Actually this itself might even work for this ticket here.
Changed branch from u/mkoeppe/build_still_non_portable_despite_sage_fat_binary_yes_because_of_numpy to public/31565
Unclear whether this is still needed now that #32021 is merged in 9.4.beta5
With #32021 and the pynac build failure fixed by #32257, we are back to being able to build and run the testsuite on Cygwin. https://github.com/mkoeppe/sage/runs/3126612760?check_suite_focus=true
Numerous SIGSEGVs whenever plotting is involved point to more trouble with numpy.
I'll try out the branch of the present ticket on top of #32257.
Changed reviewer from Matthias Koeppe to https://github.com/mkoeppe/sage/actions/runs/1054288431
That's now running at https://github.com/mkoeppe/sage/actions/runs/1054288431
Testing with #32080 merged at https://github.com/mkoeppe/sage/actions/runs/1056686451
Changed reviewer from https://github.com/mkoeppe/sage/actions/runs/1054288431 to none
Also with #32080 no change. Segfaults on every plot.
Next step would be to try to reproduce this in a local installation in Cygwin.
Reduced to critical - see https://groups.google.com/g/sage-release/c/91CGN0cra2k/m/1WwPZNshBQAJ
Changed keywords from none to sdl
Reviewer: Thierry Monteil
I confirm that this branch fixes SSE2 bug for numpy in a quemulated Pentium 3. Note however that the error appeared at run time not build time.
As this will allow me to rebuild 32bit patchbots and a new SDL for bullseye release (which i did not do for a while), i am +1 for setting this ticket a blocker and get it merged in 9.4.
If nobody complains.
Changed branch from public/31565 to 49e531d