Generating lockfiles fails with: unknown error (_ssl.c:3161)
Closed this issue ยท 19 comments
Describe the bug
When trying to generate lockfiles command fails with the following error: Failed to spawn a job for /home/manos/Workspace/pants-repo/.conda/bin/python3.9: unknown error (_ssl.c:3161)
pants --print-stacktrace -ldebug generate-lockfiles ::
18:15:17.57 [INFO] Initialization options changed: reinitializing scheduler...
18:15:22.39 [INFO] Scheduler initialized.
18:15:23.84 [INFO] Completed: Generate lockfile for python-default
18:15:23.84 [ERROR] 1 Exception encountered:
Engine traceback:
in select
..
in pants.core.goals.generate_lockfiles.generate_lockfiles_goal
`generate-lockfiles` goal
Traceback (most recent call last):
File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
res = rule.send(arg) if err is None else rule.throw(throw or err)
File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/core/goals/generate_lockfiles.py", line 557, in generate_lockfiles_goal
results = await MultiGet(
File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 361, in MultiGet
return await _MultiGet(tuple(__arg0))
File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 168, in __await__
result = yield self.gets
File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 626, in native_engine_generator_send
res = rule.send(arg) if err is None else rule.throw(throw or err)
File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/backend/python/goals/lockfile.py", line 110, in generate_lockfile
result = await Get(
File "/home/manos/.cache/nce/3d6643e46b53e4cc0b2a0d5c768866226ddce3de1f57f80c4a02d8d39800fa8e/bindings/venvs/2.18.0/lib/python3.9/site-packages/pants/engine/internals/selectors.py", line 118, in __await__
result = yield self
pants.engine.process.ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:
stderr:
Failed to spawn a job for /home/manos/Workspace/pants-repo/.conda/bin/python3.9: unknown error (_ssl.c:3161)
Use `--keep-sandboxes=on_failure` to preserve the process chroot for inspection.
Pants version
Tested with versions:
- 2.16.0
- 2.17.0
- 2.18.0
- 2.18.2
- 2.19.0rc5
(same result for all tested versions)
OS
Tested with
- Fedora 38 (Linux 6.6.13-100.fc38.x86_64)
- Fedora 39 (Linux 6.6.13-200.fc39.x86_64)
(same result for all tested versions)
Additional info
I think this issue started happening after a kernel update from Fedora. Has anyone else run into this issue before?
Any suggestions on how to resolve this would be very appreciated!
The source is free to read. My reading says the underlying SSL lib CPython is linking against is not supported (you're probably on the right track): https://github.com/python/cpython/blob/8fc8c45b6717be58ad927def1bf3ea05c83cab8c/Modules/_ssl.c#L3161
I'd ldd /home/manos/Workspace/pants-repo/.conda/bin/python3.9
to see the linkage and work from there. This is a much lower level issue than Pants and it would be good to cut Pants out of the debugging.
I'm experiencing the same issues, Also using Fedora. I'm only ever able to replicate the issue when using what's in the sandbox.
I can't:
- Get any helpful error messages
- Get the python3.9 urllib from the scie-pants venv to error the same way alone.
Using distrobox to try and run in older versions of fedora seems to have the same issue, but it doesn't seem to be entirely isolating things from the rest of the system.
The sandbox blanks out env vars and that can be important. @xlevus can you ldd
and investigate your env vs sandbox env to help isolate if this is an LD_LIBRARY_PATH or other env var required but blocked by Pants issue? The whole scie-pants thing is almost certainly way off track. If Pants launches at all, scie-pants is long out of the picture entirely.
๐ฆ[xlevus@pants-debug2 gymkhana]$ pants --keep-sandboxes=on_failure generate-lockfiles ::
10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default
10:54:06.16 [INFO] Completed: Generate lockfile for python-default
10:54:06.16 [ERROR] 1 Exception encountered:
Engine traceback:
in `generate-lockfiles` goal
ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:
stderr:
Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)
๐ฆ[xlevus@pants-debug2 gymkhana]$ ldd /usr/bin/python3.10
linux-vdso.so.1 (0x00007ffeaefee000)
libpython3.10.so.1.0 => /lib64/libpython3.10.so.1.0 (0x00007fc5a9b64000)
libc.so.6 => /lib64/libc.so.6 (0x00007fc5a9987000)
libm.so.6 => /lib64/libm.so.6 (0x00007fc5a98a7000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc5a9ebe000)
๐ฆ[xlevus@pants-debug2 gymkhana]$ python3.10
Python 3.10.13 (main, Aug 28 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _ssl
>>> _ssl.__file__
'/usr/lib64/python3.10/lib-dynload/_ssl.cpython-310-x86_64-linux-gnu.so'
>>> _ssl.OPENSSL_VERSION
'OpenSSL 3.0.9 30 May 2023'
๐ฆ[xlevus@pants-debug2 gymkhana]$ ldd /usr/lib64/python3.10/lib-dynload/_ssl.cpython-310-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007ffec9df8000)
libssl.so.3 => /lib64/libssl.so.3 (0x00007ffbd04b3000)
libcrypto.so.3 => /lib64/libcrypto.so.3 (0x00007ffbd0088000)
libc.so.6 => /lib64/libc.so.6 (0x00007ffbcfeab000)
libz.so.1 => /lib64/libz.so.1 (0x00007ffbcfe91000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffbd0592000)
๐ฆ[xlevus@pants-debug2 gymkhana]$
๐ฆ[xlevus@pants-debug2 pants-sandbox-z3rdnp]$ ldd /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9
linux-vdso.so.1 (0x00007ffc1a626000)
/home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/../lib/libpython3.9.so.1.0 => not found
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f91208e7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f91208e2000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007f91208dd000)
libm.so.6 => /lib64/libm.so.6 (0x00007f91207fd000)
librt.so.1 => /lib64/librt.so.1 (0x00007f91207f6000)
libc.so.6 => /lib64/libc.so.6 (0x00007f9120619000)
/lib64/ld-linux-x86-64.so.2 (0x00007f91208f8000)
๐ฆ[xlevus@pants-debug2 pants-sandbox-z3rdnp]$ /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9
Python 3.9.18 (main, Jan 8 2024, 05:40:12)
[Clang 17.0.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
Cannot read termcap database;
using dumb terminal settings.
>>> import _ssl
>>> _ssl.__file__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module '_ssl' has no attribute '__file__'
>>> _ssl.OPENSSL_VERSION
'OpenSSL 3.0.12 24 Oct 2023'
the original __run.sh
contents:
env -i CPPFLAGS= LANG=en_NZ.UTF-8 LDFLAGS= PATH=$'/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin' PEX_IGNORE_RCFILES=true PEX_PYTHON=/home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 PEX_ROOT=.cache/pex_root PEX_SCRIPT=pex3 /home/xlevus/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 ./pex lock create --tmpdir .tmp --python-path $'/home/xlevus/.pyenv/versions/3.10.13/bin:/home/xlevus/.pyenv/versions/3.12.1/bin:/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin' $'--output=lock.json' --no-emit-warnings $'--style=universal' --pip-version 23.1.2 --resolver-version pip-2020-resolver --target-system linux --target-system mac $'--indent=2' --no-pypi $'--index=https://pypi.org/simple/' --manylinux manylinux2014 --interpreter-constraint $'CPython==3.10.*' django
when changing PEX_PYTHON
to PEX_PYTHON=/usr/bin/python3.10
or the system installed python3.9 __run.sh
runs OK and generates a lockfile.
Further:
Unpacking pex
and changing __run.sh
to invoke __main__.py
instead, I can trace the error to : https://github.com/pantsbuild/pex/blob/v2.1.137/pex/fetcher.py#L48
(Pdb) sys.executable '/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/bin/python3.9
/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(937)_bootstrap()
-> self._bootstrap_inner()
/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(980)_bootstrap_inner()
-> self.run()
/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(917)run()
-> self._target(*self._args, **self._kwargs)
/tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/jobs.py(525)spawn_jobs()
-> result = Spawn(item, spawn_func(item))
/tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolver.py(130)_spawn_download()
-> self.observer.observe_download(target=target, download_dir=download_dir)
/tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolve/lockfile/create.py(201)observe_download()
-> url_fetcher=URLFetcher(
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/ssl.py(738)create_default_context()
-> context = SSLContext(PROTOCOL_TLS)
/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/ssl.py(484)__new__()
-> self = _SSLContext.__new__(cls, protocol)
The protocol
version being passed in is: <_SSLMethod.PROTOCOL_TLS: 2>
Buuuuut, changing the __run.sh
script to (i.e. call that function, using the same environment & interpreter) it works fine ???:
env -i CPPFLAGS= LANG=en_NZ.UTF-8 LDFLAGS= PATH=$'/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.local/bin:/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/xlevus/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/xlevus/.local/bin:/home/xlevus/bin' PEX_IGNORE_RCFILES=true PEX_PYTHON=/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 PEX_ROOT=.cache/pex_root PEX_SCRIPT=pex3 /home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/29319df9a6ca02e838617675b5b8dd7e5b18a393c27e74979823158b85c015d9/bindings/venvs/2.18.0/bin/python3.9 -c "import ssl; ssl.create_default_context()"
@xlevus it looks like you have everything at hand you need to dig. If you can come up with a docker-based repro, perhaps someone can help out, but as it stands you have the files, paths, etc.
I've created a docker-based reproduction here: https://github.com/xlevus/pants-issue-20467 and it seems to work in a Fedora VM, trying to get an ubuntu VM up to confirm it works in one of those too.
Will poke around more tonight. But i'm a little stumped tbqh.
It appears to only be an issue when specifically combining the distributed python3.9 venv for pants, and pex
. The venv's ssl-context code works fine, and the pex code works fine. but combine the two and ???
Great - thanks. I'll try to poke around with the repro case. That said, this is crazy-making "It appears to only be an issue when specifically combining the distributed python3.9" since your OP is this:
$ pants --keep-sandboxes=on_failure generate-lockfiles ::
10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default
10:54:06.16 [INFO] Completed: Generate lockfile for python-default
10:54:06.16 [ERROR] 1 Exception encountered:
Engine traceback:
in `generate-lockfiles` goal
ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1.
stdout:
stderr:
Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)
That is definitely not python3.9 let alone the scie-pants hermetic python3.9.
Great - thanks. I'll try to poke around with the repro case. That said, this is crazy-making "It appears to only be an issue when specifically combining the distributed python3.9" since your OP is this:
$ pants --keep-sandboxes=on_failure generate-lockfiles :: 10:54:06.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-ouAMHY for Generate lockfile for python-default 10:54:06.16 [INFO] Completed: Generate lockfile for python-default 10:54:06.16 [ERROR] 1 Exception encountered: Engine traceback: in `generate-lockfiles` goal ProcessExecutionFailure: Process 'Generate lockfile for python-default' failed with exit code 1. stdout: stderr: Failed to spawn a job for /usr/bin/python3.10: unknown error (_ssl.c:3161)
That is definitely not python3.9 let alone the scie-pants hermetic python3.9.
The error message is misleading. The 'failed to spawn a job' is from Pex's Job runner.
I'm 100% 89% confident the error comes from within a python3.9 executable.
Here's my hacked up sandbox with a pdb.breakpoint stuck right before the failing ssl call:
๐ฆ[xlevus@pants-debug2 pants-sandbox-WRfUMY]$ ./__run.sh
Cannot read termcap database;
using dumb terminal settings.
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(Pdb) w
/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(937)_bootstrap()
-> self._bootstrap_inner()
/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(980)_bootstrap_inner()
-> self.run()
/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/lib/python3.9/threading.py(917)run()
-> self._target(*self._args, **self._kwargs)
/tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/jobs.py(525)spawn_jobs()
-> result = Spawn(item, spawn_func(item))
/tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolver.py(130)_spawn_download()
-> self.observer.observe_download(target=target, download_dir=download_dir)
/tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/resolve/lockfile/create.py(201)observe_download()
-> url_fetcher=URLFetcher(
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(Pdb) !import sys
(Pdb) pp sys.executable
'/home/xlevus/Projects/xlvs/gymkhana/TMPHOME/.cache/nce/67912efc04f9156d8f5b48a0348983defb964de043b8c13ddc6cc8a002f8e691/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz/python/bin/python3.9'
(Pdb) n
ssl.SSLError: unknown error (_ssl.c:3161)
> /tmp/pants-sandbox-WRfUMY/.deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py(51)__init__()
-> ssl_context = ssl.create_default_context()
(pdb)
Ok, thanks for the repro case @xlevus - super helpful.
I have not figured out why PBS Python 3.9 is different here, and apparently only different in a Fedora context to boot, but the issue is related to threading. If you use a PBS 3.9 repl to import ssl; ssl.create_default_context()
- no issue as you found out. The relevant difference in the Pex case is this function is called not in the main application thread, but in a job spawn thread used for spawning parallel (subprocess) jobs. If I create an SSL context early in the main thread, all is well and the lock succeeds:
[root@3d2dd3ceaa5c pants-sandbox-qAIWax]# diff -u .deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py pex-venv/lib/python3.9/site-packages/pex/fetcher.py
--- .deps/pex-2.1.137-py2.py3-none-any.whl/pex/fetcher.py 1980-01-01 00:00:00.000000000 +0000
+++ pex-venv/lib/python3.9/site-packages/pex/fetcher.py 2024-01-28 21:18:01.662789434 +0000
@@ -4,6 +4,8 @@
from __future__ import absolute_import
import ssl
+ssl.create_default_context()
+
import time
from contextlib import closing, contextmanager
[root@3d2dd3ceaa5c pants-sandbox-qAIWax]#
There the diff represents some sandbox mucking about, but the upshot is trying to grab the context on import of pex/fetcher.py
is enough to ensure this happens in the main thread and all is well.
The remaining work to do is to see what is buggy here. Is this a PBS Python build buggy somehow? Is it a bug in Pex code - should SSLContext only ever be created in the application main thread? Is this a Fedora glibc modern (which includes libpthread) vs libpthread.so.0 which PBS links to (unlike the system Python 3.9)? I have no clue at the moment.
I'll note that I'm dropping work for the evening and I'm AFK likely until the 1st.
Further Investigation:
- Swapping PBS 2024 build with 20230826 works (but did require me to install libxcrypt on Fedora)
- Swapping PBS 2024 build with 20231002 errors in the same place.
Possible key change between the two is:
OpenSSL 1.1 -> 3.0 on supported platforms. Linux and macOS now use OpenSSL 3.0.x. Windows uses OpenSSL 3.0.x on CPython 3.11+.
@xlevus I'm working on a short-term fix in pex-tool/pex#2355. I'd still love to know what's really going on here, but 1st to stop the bleeding.
I've flipped this back to a bug - apologies @mjimlittle, you ended up being right there. With @xlevus's help debugging, a fix for this issue in Pex is now released in 2.1.163: https://github.com/pantsbuild/pex/releases/tag/v2.1.163
A Pants maintainer will take it from here and upgrade Pants / instruct you how to do so for your Pants version.
Hey @jsirois thanks for the update. Also, I am sorry I could not help out in tracing the source of the issue.
I'm relatively new to the python/pants ecosystem so I could not keep up with @xlevus :D
As a workaround, I have Dockerized pants using an Ubuntu base image and I can successfully generate needed lock files.
To use the new version of Pex without waiting on a Pants release
[pex-cli]
version = "v2.1.163"
known_versions = [
"v2.1.163|macos_arm64 |21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
"v2.1.163|macos_x86_64|21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
"v2.1.163|linux_x86_64|21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
"v2.1.163|linux_arm64 |21cb16072357af4b1f4c4e91d2f4d3b00a0f6cc3b0470da65e7176bbac17ec35|3677552",
]
(That's the sha256 and size of the pex
artifact, which you can calculate your self by downloading from the release page.)
Thanks @cburroughs works fine now!
@xlevus it turns out the issue is the custom RedHat OpenSSL option "rh-allow-sha1-signatures", seen here for example: https://gitlab.com/redhat/centos-stream/rpms/openssl/-/blob/c9s/0049-Selectively-disallow-SHA1-signatures.patch
If I do this on a fedora:37 image:
[root@d13f087cea45 /]# diff -u /etc/crypto-policies/back-ends/opensslcnf.config.orig /etc/crypto-policies/back-ends/opensslcnf.config
--- /etc/crypto-policies/back-ends/opensslcnf.config.orig 2024-02-09 00:54:33.569271689 +0000
+++ /etc/crypto-policies/back-ends/opensslcnf.config 2024-02-09 00:54:54.309267497 +0000
@@ -6,8 +6,3 @@
DTLS.MaxProtocol = DTLSv1.2
SignatureAlgorithms = ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA512:ed25519:ed448:rsa_pss_pss_sha256:rsa_pss_pss_sha384:rsa_pss_pss_sha512:rsa_pss_rsae_sha256:rsa_pss_rsae_sha384:rsa_pss_rsae_sha512:RSA+SHA256:RSA+SHA384:RSA+SHA512:ECDSA+SHA224:RSA+SHA224
-[openssl_init]
-alg_section = evp_properties
-
-[evp_properties]
-rh-allow-sha1-signatures = yes
Then a test rig works without main thread vs non shenanigans. As to why the thread makes a difference I have no clue yet, but a custom PBS build that enables openssl debug symbols and many gdb sessions later, I was able to narrow in on reading rh-allow-sha1-signatures
, which is not a standard openssl config option, as the action leading to an error return path eventually bubbling out to _ssl.c:3161.
I'll update astral-sh/python-build-standalone#207 with all the details of the debug session later tonight. This is not Gregory's problem, but others may bump into RedHat shenanigans and need the ~FAQ on what goes on when vanilla openssl in PBS tries to read RedHat custom config.
The explanation is contained in a comment in pex-tool/pex#2358 which I've pinged folks in this thread on.