sagemath/sage

homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals / multiprocessing

Closed this issue · 37 comments

(from #31335, reported in https://groups.google.com/g/sage-devel/c/9EMs9h2i_H4)

CC: @jhpalmieri @zlscherr @kiwifb @kliem

Component: build

Author: Matthias Koeppe, John Palmieri

Branch/Commit: b4ceee5

Reviewer: John Palmieri

Issue created by migration from https://trac.sagemath.org/ticket/31344

comment:1

Bisecting src/doc/en/reference/misc/index.rst (running ./sage -docbuild --keep-going all html) reveals that the crash is coming from sage.misc.cython

comment:2

For some reason, this line: cblas_pc = pkgconfig.parse(get_cblas_pc_module_name())
seems to cause the trouble.

New commits:

80720d7src/sage/misc/cython.py: Do not run pkgconfig at import time

Author: Matthias Koeppe

Commit: 80720d7

comment:5

Easiest to test on #31335, which merges this branch

comment:6

With #31335, I still see a failure during docbuilding when using homebrew's Python with Big Sur. The failure now appears when building thematic_tutorials instead of the reference manual.

------------------------------------------------------------------------
0   signals.cpython-39-darwin.so        0x00000001047e2542 print_backtrace + 66
1   signals.cpython-39-darwin.so        0x00000001047e6167 sigdie + 39
2   signals.cpython-39-darwin.so        0x00000001047e606a cysigs_signal_handler + 282
3   libsystem_platform.dylib            0x00007fff20486d7d _sigtramp + 29
4   Python                              0x00000001029edcf1 _PyArg_ParseTuple_SizeT + 158
5   libtcl8.6.dylib                     0x000000034143972e AtForkPrepare + 38
6   libsystem_pthread.dylib             0x00007fff204421a3 _pthread_atfork_prepare_handlers + 90
7   libSystem.B.dylib                   0x00007fff2a645934 libSystem_atfork_prepare + 11
8   libsystem_c.dylib                   0x00007fff20325b1b fork + 12
9   _posixsubprocess.cpython-39-darwin. 0x00000001030d77f3 subprocess_fork_exec + 860
10  Python                              0x000000010291c2da cfunction_call + 90
11  Python                              0x00000001028d1b56 _PyObject_MakeTpCall + 129
12  Python                              0x00000001029ca625 call_function + 278
13  Python                              0x00000001029c7e86 _PyEval_EvalFrameDefault + 45416
14  Python                              0x00000001029bbbd6 _PyEval_EvalCode + 403
...
272 Python                              0x00000001028d2774 _PyFunction_Vectorcall + 376
273 Python                              0x0000000102a3ade0 pymain_run_module + 212
274 Python                              0x0000000102a3a8aa pymain_run_python + 433
275 Python                              0x0000000102a3a6bd Py_RunMain + 23
276 Python                              0x0000000102a3b9da pymain_main + 35
277 Python                              0x0000000102a3bcb0 Py_BytesMain + 42
278 libdyld.dylib                       0x00007fff2045d621 start + 1
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
comment:7

Thanks for testing! I'll try a clean rebuild of the documentation and see if I can reproduce it on Catalina as well.

For reference, the trick for bisecting was to use

make build && ./sage -docbuild --keep-going all html ; ./sage -docbuild all html

the first --keep-going was necessary so that WARNING: document isn't included in any toctree does not stop the whole process.

comment:8

OK, I can reproduce it

comment:9

reducing thematic_tutorials/index.rst to the following still reproduces the crash:

.. Sage documentation master file, created by sphinx-quickstart on Thu
.. Aug 21 20:15:55 2008. You can adapt this file completely to your
.. liking, but it should at least contain the root `toctree` directive.

.. _thematic-tutorials:

Welcome to the Sage Thematic Tutorials!
=======================================


* `Tutorial: Symbolics and Plotting (PREP) <../prep/Symbolics-and-Basic-Plotting.html>`_
comment:10

That's in an incremental docbuild - so something bad must have been saved in the inventory.

comment:11

When I saw the original problem, I only saw it on the second pass through the ref manual build, which is consistent with seeing problems based on something in the inventory.

comment:12

Does anyone know why

./sage --docbuild all html

fails at thematic_tutorial but

./sage --docbuild thematic_tutorial html

works?

comment:13

In fact, after

./sage -docbuild --keep-going all html

failed, I tried building thematic_tutorial by itself. That worked, and then make doc says it was successful.

comment:14

I think the bug is triggered by the parallelization code in sage_setup.docbuild.AllBuilder.

comment:15

We previously had trouble with this code (build_many - from #28356, #27514, #27490) in #30351, #28483, ...

comment:16

see also #31289

comment:17

In any case, I think this ticket is an improvement by itself, as it removes some accidental globals from the module sage.misc.cython and reduced its load time.

comment:19

With this change, the documentation builds for me (but of course it is missing a plot):

diff --git a/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst b/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
index 9faa9f2375..bc77d72e68 100644
--- a/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
+++ b/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
@@ -94,7 +94,6 @@ Vector fields can be plotted::
     E = EuclideanSpace(3)
     x, y, z = E.default_chart()[:]
     v = E.vector_field(-y, x, sin(x*y*z), name='v')
-    sphinx_plot(v.plot(max_range=1.5, scale=0.5))
 
 For customizing the plot, see the list of options in the documentation of
 :meth:`~sage.manifolds.differentiable.vectorfield.VectorField.plot`.
comment:20

This does not seem to help on my machine

comment:21

Sorry, it turns out that it doesn't consistently help on mine, either. I think this should stop the non-reference manual docs from being built in parallel:

diff --git a/src/sage_setup/docbuild/__init__.py b/src/sage_setup/docbuild/__init__.py
index b07e9c100c..1d4139555e 100644
--- a/src/sage_setup/docbuild/__init__.py
+++ b/src/sage_setup/docbuild/__init__.py
@@ -286,13 +286,15 @@ class DocBuilder(object):
 
 from .utils import build_many as _build_many
 
-def build_many(target, args):
+def build_many(target, args, processes=None):
     """
     Thin wrapper around `sage_setup.docbuild.utils.build_many` which uses the
     docbuild settings ``NUM_THREADS`` and ``ABORT_ON_ERROR``.
     """
+    if processes is None:
+        processes = NUM_THREADS
     try:
-        _build_many(target, args, processes=NUM_THREADS)
+        _build_many(target, args, processes=processes)
     except BaseException as exc:
         if ABORT_ON_ERROR:
             raise
@@ -349,7 +351,7 @@ class AllBuilder(object):
 
         # build the other documents in parallel
         L = [(doc, name, kwds) + args for doc in others]
-        build_many(build_other_doc, L)
+        build_many(build_other_doc, L, 1)
         logger.warning("Elapsed time: %.1f seconds."%(time.time()-start))
         logger.warning("Done building the documentation!")
comment:22

#31289 doesn't seem to help, by the way.

comment:23

Perhaps conditionalize this change on macOS?

comment:24

I've pushed this change to the branch of #31335, but it does not actually fix the problem for me. I'll try next if replacing the build_many by a for loop helps.

comment:25

(retracted)

Branch pushed to git repo; I updated commit sha1. New commits:

515f899sage_setup.docbuild.AllBuilder: stop the non-reference manual docs from being built in parallel
804ebd7sage_setup.dpcbuild.AllBuilder: Restrict workaround to macOS
b4ceee5sage_setup.docbuild: In the workaround, do not go through build_many to build serially

Changed commit from 80720d7 to b4ceee5

comment:28

This fixes the problem on my machine. Please test on Big Sur

Changed author from Matthias Koeppe to Matthias Koeppe, John Palmieri

comment:31

Worked for me on Big Sur

Reviewer: John Palmieri

comment:32

This works for me, too. It would be nice to know that the actual problem is beyond "some murky issue with parallel docbuilding on OS X," but it's good enough to merge. @zlscherr, feel free to add your real name to the reviewers field (and also to the wiki page, if you want).

comment:33

Thanks!

comment:34

this fixed the docbuild crash on my Big Sur bix too

slel commented
comment:35

On macOS 10.14.6: dochtml builds with this, while it does not with 9.3.beta7 or #31419.