sagemath/sage

Build still non-portable despite SAGE_FAT_BINARY=yes because of numpy

mkoeppe opened this issue · 47 comments

Follow-up from #29537, #31521.

Observed on cygwin-standard but likely also affects the Docker images and the Sage binary distribution.

With the upgrade to numpy 1.20.x (#31008), the non-portability shows as an error message instead of a crash:

  [sagelib-9.4.beta0]       from numpy.core._multiarray_umath import (
  [sagelib-9.4.beta0]   RuntimeError: NumPy was built with baseline optimizations: 
  [sagelib-9.4.beta0]   (SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX) but your machine doesn't support:
  [sagelib-9.4.beta0]   (AVX512F).
  [sagelib-9.4.beta0]   ************************************************************************
  [sagelib-9.4.beta0]   Error building the Sage library
  [sagelib-9.4.beta0]   ************************************************************************
  [sagelib-9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib-9.4.beta0.log

Report withs the 9.3 Linux binary:

Depends on #32257

CC: @embray @kliem @dimpase @vbraun @kiwifb @slel

Component: build

Keywords: sdl

Author: Jonathan Kliem

Branch/Commit: 49e531d

Reviewer: Thierry Monteil

Issue created by migration from https://trac.sagemath.org/ticket/31565

comment:2

What are the symptoms? I don't really do cygwin but I may do docker one day.

comment:3

The symptom is that something you build on one machine does not run on another machine, aborting with SIGILL.

Description changed:

--- 
+++ 
@@ -1,3 +1,4 @@
 Follow-up from #29537, #31521.
 
-Observed on `cygwin-standard` but likely also affects the Docker images.
+Observed on `cygwin-standard` but likely also affects the Docker images and the Sage binary distribution.
+
comment:5

Moving to 9.4, as 9.3 has been released.

Description changed:

--- 
+++ 
@@ -2,3 +2,17 @@
 
 Observed on `cygwin-standard` but likely also affects the Docker images and the Sage binary distribution.
 
+With the upgrade to numpy 1.20.x (#31008), the non-portability shows as an error message instead of a crash:
+
+```
+  [sagelib-9.4.beta0]       from numpy.core._multiarray_umath import (
+  [sagelib-9.4.beta0]   RuntimeError: NumPy was built with baseline optimizations: 
+  [sagelib-9.4.beta0]   (SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F AVX512CD AVX512_SKX) but your machine doesn't support:
+  [sagelib-9.4.beta0]   (AVX512F).
+  [sagelib-9.4.beta0]   ************************************************************************
+  [sagelib-9.4.beta0]   Error building the Sage library
+  [sagelib-9.4.beta0]   ************************************************************************
+  [sagelib-9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib-9.4.beta0.log
+```
+
+

Dependencies: #31008

kliem commented
comment:7

I guess we need to unwind #31521 now??

comment:8

Ah, that's right!

Description changed:

--- 
+++ 
@@ -15,4 +15,7 @@
   [sagelib-9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib-9.4.beta0.log
 ```
 
+Report with the 9.3 Linux binary: https://groups.google.com/g/sage-devel/c/Lj-wx4xm0N4/m/9IiDtu4_CgAJ
 
+
+

Description changed:

--- 
+++ 
@@ -15,7 +15,9 @@
   [sagelib-9.4.beta0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sagelib-9.4.beta0.log
 ```
 
-Report with the 9.3 Linux binary: https://groups.google.com/g/sage-devel/c/Lj-wx4xm0N4/m/9IiDtu4_CgAJ
+Report withs the 9.3 Linux binary: 
+- https://groups.google.com/g/sage-devel/c/Lj-wx4xm0N4/m/9IiDtu4_CgAJ
+- https://groups.google.com/g/sage-support/c/KZFZBoI6xJk/m/aBY9ZIxoBwAJ
 
 
 

New commits:

b249c47Revert "Revert "do not allow numpy intrinsics when building fat binary""

Author: Jonathan Kliem

Commit: b249c47

kliem commented
comment:14

Thanks for taking care of this.

comment:15

The cygwin-standard build looked rather promising (no "baseline optimizations" message, no crash when just importing numpy) but I am getting crashes again https://github.com/mkoeppe/sage/runs/2867645468 when the doctests do any plotting.

Changed reviewer from https://github.com/mkoeppe/sage/actions/runs/952966309 to Matthias Koeppe

comment:16

The other runs at https://github.com/mkoeppe/sage/runs/2865933225?check_suite_focus=true look clean.

So I consider this ticket already an improvement. We'll have to chase the crash on cygwin when switching CPUs between build stages in ... yet another ... follow-up ticket.

comment:17

More numpy-related trouble, looking as e.g. from https://groups.google.com/d/msgid/sage-devel/763f8650-9803-4bba-a0ec-46744204fc22n%40googlegroups.com. (there are more reports like this)

[sagelib-9.4.beta2]   File "/home/chapoton/sage/local/lib/python3.8/site-packages/numpy/core/overrides.py", line 7, in <module>
[sagelib-9.4.beta2]     from numpy.core._multiarray_umath import (
[sagelib-9.4.beta2] RuntimeError: NumPy was built with baseline optimizations: 
[sagelib-9.4.beta2] (SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42) but your machine doesn't support:
[sagelib-9.4.beta2] (POPCNT).

So somehow CPU changes its state, and refuses to say yes to features tested as yes during the Numpy build?!
Is it due to some CPU flags manipulations happening during sagelib build?

comment:18

I've opened #32021 to deal with that RuntimeError: NumPy was built with baseline optimizations: thing.

kliem commented
comment:19

Acutally the definition of cpu-dispatch in NUMPY_FCONFIG isn't clean. However, only the vallue will be passed and not the variable NUMPY_FCONFIG.

kliem commented
comment:20

This doesn't work. See #32021 comment:10.

It's a build option not a configure option.

kliem commented
comment:21

Actually the place to do it is correct however:

The command arguments are available in build, build_clib, and build_ext. if build_clib or build_ext are not specified by the user, the arguments of build will be used instead, which also holds the default values.

So this does not work for bdist_wheel and this is what we use.

comment:22

You can do setup.py bdist_wheel build [...build-options...], see for example build/pkgs/jupyter_jsmol/spkg-install.in

kliem commented
comment:23

Thanks. This seems to do the trick.

Actually this itself might even work for this ticket here.

kliem commented

Changed commit from b249c47 to 82ae485

kliem commented

New commits:

82ae485disable baseline in case of SAGE_FAT_BINARY
comment:26

Unclear whether this is still needed now that #32021 is merged in 9.4.beta5

Changed dependencies from #31008 to #32257

comment:28

With #32021 and the pynac build failure fixed by #32257, we are back to being able to build and run the testsuite on Cygwin. https://github.com/mkoeppe/sage/runs/3126612760?check_suite_focus=true

Numerous SIGSEGVs whenever plotting is involved point to more trouble with numpy.

I'll try out the branch of the present ticket on top of #32257.

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

d4156f7build/pkgs/singular/patches/0001-factory-canonicalform.h-Add-more-FACTORY_PUBLIC.patch: New
49e531ddisable baseline in case of SAGE_FAT_BINARY

Changed commit from 82ae485 to 49e531d

Changed reviewer from Matthias Koeppe to https://github.com/mkoeppe/sage/actions/runs/1054288431

comment:31

Replying to @mkoeppe:

Numerous SIGSEGVs whenever plotting is involved point to more trouble with numpy.

I'll try out the branch of the present ticket on top of #32257.

Same issues as before.

comment:33

Also with #32080 no change. Segfaults on every plot.

Next step would be to try to reproduce this in a local installation in Cygwin.

Changed keywords from none to sdl

Reviewer: Thierry Monteil

comment:36

I confirm that this branch fixes SSE2 bug for numpy in a quemulated Pentium 3. Note however that the error appeared at run time not build time.

As this will allow me to rebuild 32bit patchbots and a new SDL for bullseye release (which i did not do for a while), i am +1 for setting this ticket a blocker and get it merged in 9.4.

comment:37

If nobody complains.

Changed branch from public/31565 to 49e531d