pex-tool/pex

Generating lockfiles fails with: unknown error (_ssl.c:3161)

jsirois opened this issue · 3 comments

As initially reported here: pantsbuild/pants#20467

A streamlined repro:

FROM fedora:37

RUN curl --fail -sSL -O \
    https://github.com/indygreg/python-build-standalone/releases/download/20240107/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz && \
    curl --fail -sSL -O \
    https://github.com/indygreg/python-build-standalone/releases/download/20240107/cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz.sha256 && \
    [[ \
        "$(cat cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz.sha256)" \
        == \
        "$( \
            sha256sum cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz | \
            cut -d' ' -f1 \
        )" \
    ]] && \
    tar -xzf cpython-3.9.18+20240107-x86_64-unknown-linux-gnu-install_only.tar.gz

RUN python/bin/python3.9 -mvenv pex.venv && \
    pex.venv/bin/pip install -U pip && \
    pex.venv/bin/pip install pex

ENV PATH=$PATH:pex.venv/bin
$ docker build . -t repro
[+] Building 0.0s (8/8) FINISHED                                                                         docker:default
 => [internal] load .dockerignore                                                                                  0.0s
 => => transferring context: 2B                                                                                    0.0s
 => [internal] load build definition from Dockerfile                                                               0.0s
 => => transferring dockerfile: 965B                                                                               0.0s
 => [internal] load metadata for docker.io/library/fedora:37                                                       0.0s
 => [1/4] FROM docker.io/library/fedora:37                                                                         0.0s
 => CACHED [2/4] RUN dnf install -y curl                                                                           0.0s
 => CACHED [3/4] RUN curl --fail -sSL -O     https://github.com/indygreg/python-build-standalone/releases/downloa  0.0s
 => CACHED [4/4] RUN python/bin/python3.9 -mvenv pex.venv &&     pex.venv/bin/pip install -U pip &&     pex.venv/  0.0s
 => exporting to image                                                                                             0.0s
 => => exporting layers                                                                                            0.0s
 => => writing image sha256:04f8a2564207187e005e30099ac79ad8957a2af4c861a97ee7b3c1ba62ca6ed4                       0.0s
 => => naming to docker.io/library/repro                                                                           0.0s
$ docker run --rm -it repro pex3 lock create cowsay
Failed to spawn a job for /pex.venv/bin/python3.9: unknown error (_ssl.c:3161)
$ echo $?
1
$

The underlying issue here is still unknown. The Python ssl docs make no mention of any special thread considerations but using PBS Python on older Fedora consistently leads to the above issue, which appears to be solved by calling ssl.create_default_context(...) from the main application thread. In this case, it is caused from the pex.jobs.execute_parallel background job spawn thread here:
https://github.com/pantsbuild/pex/blob/a32dd36448103570fd6c1b284164334bc68562da/pex/jobs.py#L538-L555

Light is starting to dawn on the real issue here: pantsbuild/pants#20467 (comment)

I'll add more info or link to it to close this issue out as truly understood and not just papered over.

Ok, #2358 contains a code comment that buttons this up and the issue can remain closed in good conscience.

For posterity, the test rig used inside a gdb python/install/bin/python3.9 session in thread and no thread modes to suss all this out:

import ssl
import sys
import threading


def create_ssl_context():
    return ssl.create_default_context()


SSL_CONTEXT = None


def store_ssl_context():
    global SSL_CONTEXT
    SSL_CONTEXT = create_ssl_context()


def get_ssl_context():
    global SSL_CONTEXT
    return SSL_CONTEXT


def main():
    args = sys.argv[1:]
    if args and args[0] == "--no-thread":
        print(create_ssl_context())
    else:
        thread = threading.Thread(target=store_ssl_context)
        thread.daemon = True
        thread.start()
        thread.join()
        print(get_ssl_context())


if __name__ == "__main__":
    sys.exit(main())

The only other trick was using a custom debug build of PBS with the patch:

diff --git a/cpython-unix/build-openssl-3.0.sh b/cpython-unix/build-openssl-3.0.sh
index 1d1f913..cd88a0a 100755
--- a/cpython-unix/build-openssl-3.0.sh
+++ b/cpython-unix/build-openssl-3.0.sh
@@ -40,6 +40,7 @@ EXTRA_FLAGS="${EXTRA_FLAGS} ${EXTRA_TARGET_CFLAGS}"
 /usr/bin/perl ./Configure \
   --prefix=/tools/deps \
   --libdir=lib \
+  --debug \
   ${OPENSSL_TARGET} \
   no-legacy \
   no-shared \

And built via ./build-linux.py --optimizations debug --python cpython-3.9.