homebrew: docbuild crashes, libtcl AtForkPrepare - from sage.misc.cython globals / multiprocessing
Closed this issue · 37 comments
(from #31335, reported in https://groups.google.com/g/sage-devel/c/9EMs9h2i_H4)
CC: @jhpalmieri @zlscherr @kiwifb @kliem
Component: build
Author: Matthias Koeppe, John Palmieri
Branch/Commit: b4ceee5
Reviewer: John Palmieri
Issue created by migration from https://trac.sagemath.org/ticket/31344
Bisecting src/doc/en/reference/misc/index.rst (running ./sage -docbuild --keep-going all html) reveals that the crash is coming from sage.misc.cython
For some reason, this line: cblas_pc = pkgconfig.parse(get_cblas_pc_module_name())
seems to cause the trouble.
Author: Matthias Koeppe
With #31335, I still see a failure during docbuilding when using homebrew's Python with Big Sur. The failure now appears when building thematic_tutorials instead of the reference manual.
------------------------------------------------------------------------
0 signals.cpython-39-darwin.so 0x00000001047e2542 print_backtrace + 66
1 signals.cpython-39-darwin.so 0x00000001047e6167 sigdie + 39
2 signals.cpython-39-darwin.so 0x00000001047e606a cysigs_signal_handler + 282
3 libsystem_platform.dylib 0x00007fff20486d7d _sigtramp + 29
4 Python 0x00000001029edcf1 _PyArg_ParseTuple_SizeT + 158
5 libtcl8.6.dylib 0x000000034143972e AtForkPrepare + 38
6 libsystem_pthread.dylib 0x00007fff204421a3 _pthread_atfork_prepare_handlers + 90
7 libSystem.B.dylib 0x00007fff2a645934 libSystem_atfork_prepare + 11
8 libsystem_c.dylib 0x00007fff20325b1b fork + 12
9 _posixsubprocess.cpython-39-darwin. 0x00000001030d77f3 subprocess_fork_exec + 860
10 Python 0x000000010291c2da cfunction_call + 90
11 Python 0x00000001028d1b56 _PyObject_MakeTpCall + 129
12 Python 0x00000001029ca625 call_function + 278
13 Python 0x00000001029c7e86 _PyEval_EvalFrameDefault + 45416
14 Python 0x00000001029bbbd6 _PyEval_EvalCode + 403
...
272 Python 0x00000001028d2774 _PyFunction_Vectorcall + 376
273 Python 0x0000000102a3ade0 pymain_run_module + 212
274 Python 0x0000000102a3a8aa pymain_run_python + 433
275 Python 0x0000000102a3a6bd Py_RunMain + 23
276 Python 0x0000000102a3b9da pymain_main + 35
277 Python 0x0000000102a3bcb0 Py_BytesMain + 42
278 libdyld.dylib 0x00007fff2045d621 start + 1
------------------------------------------------------------------------
Unhandled SIGILL: An illegal instruction occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------
Thanks for testing! I'll try a clean rebuild of the documentation and see if I can reproduce it on Catalina as well.
For reference, the trick for bisecting was to use
make build && ./sage -docbuild --keep-going all html ; ./sage -docbuild all html
the first --keep-going was necessary so that WARNING: document isn't included in any toctree does not stop the whole process.
OK, I can reproduce it
reducing thematic_tutorials/index.rst to the following still reproduces the crash:
.. Sage documentation master file, created by sphinx-quickstart on Thu
.. Aug 21 20:15:55 2008. You can adapt this file completely to your
.. liking, but it should at least contain the root `toctree` directive.
.. _thematic-tutorials:
Welcome to the Sage Thematic Tutorials!
=======================================
* `Tutorial: Symbolics and Plotting (PREP) <../prep/Symbolics-and-Basic-Plotting.html>`_
That's in an incremental docbuild - so something bad must have been saved in the inventory.
When I saw the original problem, I only saw it on the second pass through the ref manual build, which is consistent with seeing problems based on something in the inventory.
Does anyone know why
./sage --docbuild all html
fails at thematic_tutorial but
./sage --docbuild thematic_tutorial html
works?
In fact, after
./sage -docbuild --keep-going all html
failed, I tried building thematic_tutorial by itself. That worked, and then make doc says it was successful.
I think the bug is triggered by the parallelization code in sage_setup.docbuild.AllBuilder.
In any case, I think this ticket is an improvement by itself, as it removes some accidental globals from the module sage.misc.cython and reduced its load time.
With this change, the documentation builds for me (but of course it is missing a plot):
diff --git a/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst b/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
index 9faa9f2375..bc77d72e68 100644
--- a/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
+++ b/src/doc/en/thematic_tutorials/vector_calculus/vector_calc_cartesian.rst
@@ -94,7 +94,6 @@ Vector fields can be plotted::
E = EuclideanSpace(3)
x, y, z = E.default_chart()[:]
v = E.vector_field(-y, x, sin(x*y*z), name='v')
- sphinx_plot(v.plot(max_range=1.5, scale=0.5))
For customizing the plot, see the list of options in the documentation of
:meth:`~sage.manifolds.differentiable.vectorfield.VectorField.plot`.This does not seem to help on my machine
Sorry, it turns out that it doesn't consistently help on mine, either. I think this should stop the non-reference manual docs from being built in parallel:
diff --git a/src/sage_setup/docbuild/__init__.py b/src/sage_setup/docbuild/__init__.py
index b07e9c100c..1d4139555e 100644
--- a/src/sage_setup/docbuild/__init__.py
+++ b/src/sage_setup/docbuild/__init__.py
@@ -286,13 +286,15 @@ class DocBuilder(object):
from .utils import build_many as _build_many
-def build_many(target, args):
+def build_many(target, args, processes=None):
"""
Thin wrapper around `sage_setup.docbuild.utils.build_many` which uses the
docbuild settings ``NUM_THREADS`` and ``ABORT_ON_ERROR``.
"""
+ if processes is None:
+ processes = NUM_THREADS
try:
- _build_many(target, args, processes=NUM_THREADS)
+ _build_many(target, args, processes=processes)
except BaseException as exc:
if ABORT_ON_ERROR:
raise
@@ -349,7 +351,7 @@ class AllBuilder(object):
# build the other documents in parallel
L = [(doc, name, kwds) + args for doc in others]
- build_many(build_other_doc, L)
+ build_many(build_other_doc, L, 1)
logger.warning("Elapsed time: %.1f seconds."%(time.time()-start))
logger.warning("Done building the documentation!")#31289 doesn't seem to help, by the way.
Perhaps conditionalize this change on macOS?
I've pushed this change to the branch of #31335, but it does not actually fix the problem for me. I'll try next if replacing the build_many by a for loop helps.
(retracted)
Branch pushed to git repo; I updated commit sha1. New commits:
515f899 | sage_setup.docbuild.AllBuilder: stop the non-reference manual docs from being built in parallel |
804ebd7 | sage_setup.dpcbuild.AllBuilder: Restrict workaround to macOS |
b4ceee5 | sage_setup.docbuild: In the workaround, do not go through build_many to build serially |
This fixes the problem on my machine. Please test on Big Sur
Changed author from Matthias Koeppe to Matthias Koeppe, John Palmieri
Worked for me on Big Sur
Reviewer: John Palmieri
Thanks!
this fixed the docbuild crash on my Big Sur bix too
On macOS 10.14.6: dochtml builds with this, while it does not with 9.3.beta7 or #31419.
Changed branch from u/mkoeppe/homebrew__docbuild_crashes__libtcl_atforkprepare to b4ceee5