python-cffi/cffi

Support for new `cp313t` GIL-free ABI variant

nitzmahone opened this issue ยท 7 comments

I did some smoke testing with the new GIL-free t CPython ABI on 3.13.0a2. I had to patch up wheel with a pre-release version of packaging to get it to build and install without barfing on the new ABI tag, but once I got past that, most of the CFFI test suite is passing. Looks like the majority of the failures are related to new symbol exports (possibly unique to the nogil build?) that need to be accounted for, eg:

ERROR testing/cffi1/test_new_ffi_1.py::TestNewFFI1::test_addressof_anonymous_struct - 
ImportError: /tmp/ffi-4/test_new_ffi_1.cpython-313t-x86_64-linux-gnu.so: 
undefined symbol: _Py_atomic_load_uint32_relaxed

I've already added some 3.13 pre-release smoke-testing- once the various packaging tools we need have released versions that support the new t ABI tagging, we can include that as well.

arigo commented

It's very much more involved than that. CFFI is full of static global caches (guaranteeing the uniqueness of many CTypeObject, for example). There is also careful logic around when to acquire and release the GIL around calls, and particularly callbacks. I believe that if you just try out the tests, then yes, it will mostly work because they are mostly single-threaded. In a real multithreaded environment you're going to see very obscure failures sometimes (e.g. "object of ctype 'int **' is expected, but got instead 'int **'").

Supporting multithreading requires a through review of the whole source code of CFFI, I'm afraid. Otherwise we get some "meh, doesn't usually crashes, that's good enough". I certainly hope that it's not what CPython is going for in the GIL-free ABI variant. I would object to distribute such a version of CFFI.

@arigo 100% agreed on all points- I just wanted to see what kind of gaps might need to be filled to even get it to build and run single-threaded under the new t ABI.

I re-tried some of the concurrency testing I'd previously done on Sam's 3.9 nogil PoC branch against both 3.13.0a2 and 3.13 head just to see how unstable things were, but the GIL disablement doesn't even appear functional with a --disable-gil interpreter build yet. IIUC, that part hasn't even merged yet (see the "Enable/Disable the GIL at runtime" commit), so none of the real investigation or testing can even start beyond "yay, it builds".

I'm also very much against actually shipping a known-broken t wheel, but since the entire CI suite is geared toward testing packaged bits, I'm hoping to avoid creating a completely separate path for evaluating the state of things under the new ABI- basically just excluding it from the artifacts that get shipped.

Hi!

We've been working on porting extension modules to the free-threading interpreter and tracking the work over at https://github.com/Quansight-Labs/free-threaded-compatibility. I did most of the work on the NumPy port, which also has a lot of static global caches I had to make thread local, add a single-initialization API and initialize the cache using that, or add locking.

I don't know whether I can directly work on cffi, but I'm happy to answer questions or share experiences.

Maybe this issue should be left open until the static global caches alluded to above are made thread-safe?

Does cffi have existing multithreaded test coverage? If not then I'd be suspicious about having found all the data races that may exist in C code.

@ngoldbaum There's a new issue for tracking known blockers to supporting free-threading: #126

๐Ÿ™ƒ

makes sense! sorry for the noise...

@ngoldbaum No worries- thanks for all you're doing (all over the place ๐Ÿ˜† ) to keep stuff moving forward on this! It's fun to see things starting to actually light up and work, but yeah, creating heavily threaded stress tests to exercise all the init, type creation, refcounting, buffer marshaling, etc is going to be a big job.

IIRC nearly none of the existing test suite exercises Python threads- most of the concurrent tests are focused on the embedding use case (with natively-created pthreads calling in to Python). Even those tests are mainly focused on stressing the interpreter startup locking- the actual Python operations are near-no-op idempotent functions with stack-only scalars, so very little concurrent exercise of all the little internal atomic-ish things that the GIL is protecting today.