datetime extension module isolation

Question

datetime extension module isolation

Closed this issue 7 months ago · 22 comments

PEP-687 is about to be implemented to the _datetime module.

Two approaches are proposed at present. I would like Steering Council to choose one.

(regular)
Multi-phase-init with per-module heaptypes (and C-API).
abi: Store a reference (to the C-API struct) in PyInterpreterState.
(special)
Multi-phase-init with per-interpreter heaptypes (and C-API).
abi: Store at least 7 heaptypes' references in PyInterpreterState.

The following shows the replacements of _datetime heaptypes (multi-phase init module) when running test_datetime in the refleak test:

main interp sequence	1.per-module	2.per-interp
import module	heaptypes_A	heaptypes_A
module reload (new module)	heaptypes_B (new)	(no change)
module reload (new module)	heaptypes_C (new)	(no change)
interp finish

The approach (2) avoids possible regressions¹ caused by mixing modules with ~~forbidden~~ hacky usage²³. Technically, ~~(2) does not make sense~~ (1) should be enough. In my opinion, unknown regressions should not be hidden by (2). Also, (2) will be a bad official example of isolation. However, there seems to be a case where maintaining (1) is not good for mental health, which is also serious.

Related issue/PR: gh-117398, gh-118357

Thanks

UPDATED: a few rewords

Answer 1 · 2024-05-10T23:38:06.000Z

FYI, _datetime code owners are not involved in PEP-687 implementation (no response).

Answer 2 · 2024-05-11T07:42:11.000Z

Intentionally reloading modules isn't forbidden usage. It's typically weird usage, but it's not forbidden. We include explicit workarounds in the import machinery so the interpreter doesn't die when extension modules using single-phase init are reloaded, and the impact of those workarounds results in behaviour similar to what @vstinner has proposed in gh-117398.

The benefit of adopting an approach like the one Victor suggests in a case like this one is that it allows "support per-interpreter imports" to be separated from "generate completely new heap types when reloading the module" (in this particular case, it isn't clear the latter is actually desirable behaviour, and it is definitely backwards incompatible behaviour).

The extra pointer to a substruct in PyInterpreterState isn't ideal, but getting rid of it by migrating to a different C API design that doesn't need it is better deferred until after the migration to multi-phase init (especially since it also makes it much clearer where the problems are actually arising).

Answer 3 · 2024-05-11T18:12:18.000Z

gh-118337 fully implements:

1.(regular)
Multi-phase-init with per-module heaptypes (and C-API).
abi: Store a reference (to the C-API struct) in PyInterpreterState.

It passes refleak bot tests where the module is reloaded for a good reason.

Answer 4 · 2024-05-12T04:22:08.000Z

As noted in python/cpython#117398 (comment), this may be a test coverage concern moreso than an SC design decision. I doubt the existing test suite explicitly covers the situation we're concerned about (a capsule or C API pointer reference acquired before the reload being used after the reload and crashing the interpreter), so even green CI may not be enough to be certain that full reload support is backwards compatible.

Answer 5 · 2024-05-12T06:31:20.000Z

I would llke an SC decision about how much a module can depend on PyInterpreterState.

_datetime and other modules are not so different except the presence of a capsule C-API, and the capsule is not so different from the module state.

Answer 6 · 2024-05-13T01:50:35.000Z

While you're correct that PyModule_GetState and PyCapsule_GetPointer are both technically exposed to the same pointer invalidation problem if a C module is reloaded, there's a key difference in their intended usage model (albeit not a well-documented one, since the distinction never mattered when single-phase init was the only option and hence there was no true reloading of extension modules):

PyModule_GetState is for the module itself to use. If the module gets genuinely reloaded, references to the old module state will be replaced by references to the new module state. Other code should NOT be calling PyModule_GetState, and if it is, and the offending code breaks on module reload, the response is going to be "Don't do that then".
By contrast, PyCapsule_GetPointer is explicitly an export API. Once another module has received a capsule object from the exporting module, there isn't any expectation that reloading the exporting module might invalidate previously exported capsule instances (and under single phase initialisation, they don't get invalidated, since the extension modules are never truly reloaded). Code that is using the capsule API can break through no fault of its own if we don't preserve capsule pointer stability across module reloads.

Don't get me wrong, I think you've identified a genuine functional gap in the multi-phase init feature here: the existence of the capsule API means that there really should be a public C API for extension modules using multi-phase init to define and manage per-interpreter-instance state in addition to the existing support for defining per-module-instance state. While the standard library can work around this problem by making use of PyInterpreterState, third party modules don't have that luxury.

In the specific case of migrating datetime to multi-phase init, though, it's OK for the sequence of events to be:

migrate datetime using PyInterpreterState to maintain pointer stability across reloads for the capsule export API and the public C API
subsequently use datetime and its capsule API as a motivating example for adding a public API to allow multi-phase init extension modules to define and access per-interpreter-instance state in a general way (this could potentially be as simple as a new internal state dict keyed by module name, akin to the sys.modules cache, but not exposed via a Python API)
adjust any standard library modules affected by this problem to use the new public per-interpreter state management API rather than private module specific workarounds

Answer 7 · 2024-05-13T12:05:28.000Z

While the standard library can work around this problem by making use of PyInterpreterState, third party modules don't have that luxury.

What's wrong with using PyInterpreterState_GetDict for this? (Personally I use it to stash a reference to my module so that I have something to PyModule_GetState on later.)

Threading while reloading is probably a nightmare, but it's always going to be a nightmare. Ignoring that, PyDict_SetItemString(...) during module exec seems safe enough?

Answer 8 · 2024-05-13T16:03:44.000Z

By contrast, PyCapsule_GetPointer is explicitly an export API.

If datetime API includes the direct use of PyCapsule_GetPointer(), a capsule should be held until the interpreter finishes (by using PyInterpreterState_GetDict()?). The API struct could be updated by memcpy() or something on every module-reload. (I'm honestly not so worried about C-API here)

Per-module heaptypes should be enough, but ~~may be too technical~~ may not be intuitive. I would like to know the rule and the exception.

Answer 9 · 2024-05-15T06:30:10.000Z

@erlend-aasland also pointed out this past discussion of the external-resource-management variant of this problem (which leads to needing the ability to set up per-process locks rather than per-interpreter ones): https://discuss.python.org/t/a-new-c-api-for-extensions-that-need-runtime-global-locks/20668

Answer 10 · 2024-05-15T19:27:36.000Z

Did @erlend-aasland point out that the problem applies to _datetime?

Answer 11 · 2024-05-16T11:54:00.000Z

The linked discussion was not about _datetime, IIRC. However, such an API could have been used to solve the probablems with the encapsulated _datetime C API in a nice way, I think.

Answer 12 · 2024-05-20T03:18:56.000Z

Right, the specific problem in the _datetime case is ensuring pointer stability across module reloads for both its C API and its capsule API.

There is a similar-but-distinct problem that comes up when managing external (non-Python) resources that don't allow reinitialisation within a single process: module reloads need to avoid reinitialising the affected external resources, and instead make them available to the reinitialised instance of the module.

While the referenced thread is specifically about the external resource management variant of the problem, any public API we come up with needs to handle both cases.

Getting back to the original question for the SC here though, I think it boils down to:

We have established that some extension modules need to preserve some of their state across reloads when migrating to multi-phase initialisation. However, there is not yet a public API for this, so the simplest option for affected modules is to stick with single-phase initialisation until such an API is defined.
We'd like to make more standard library extension modules compatible with subinterpreters running without a shared GIL, which means migrating them to use multi-phase initialisation
Some standard library extension modules are affected by the need to maintain pointer stability or preserve external resources across reloads. In such cases, the current workaround is to use PyInterpreterState fields that point to nested structs until such time as a standardised public mechanism is defined for management of extension module state that needs to be preserved across reloads.
Migrating the _datetime module to multi-phase initialisation without ensuring public pointer stability across module reloads doesn't actually cause any CI failures, but this is potentially due to a lack of relevant CI test coverage rather than due to it actually being a safe change.

I think the only SC level decision here is "Is it OK to continue using the PyInterpreterState workaround to allow standard library extension modules to be migrated to multi-phase initialisation in the absence of a public API for preserving state across reloads?". It doesn't solve the problem for third party modules, but it does move the problem off the critical path for improvements to subinterpreter support.

The specific case of _datetime is better resolved by adding the missing test coverage to the multi-phase init migration PR and seeing if it fails (and assuming it fails as expected, using the PyInterpreterState workaround to eliminate the failure).

Answer 13 · 2024-05-20T22:51:06.000Z

_datetime is the last remaining extension module that need to be migrated.

While we can suggest that the new generic C-API is needed, there seems to be no individual discussion (issue) about its design related to _datetime as well, which I think is one reason for the delay of _datetime migration.

so the simplest option for affected modules is to stick with single-phase

I think it is OK for third party modules to keep single-phase init. The migration is up to users.

1.We have established that some extension modules need to preserve
3.Some standard library extension modules are affected by the need to maintain pointer stability

What standard library extension modules are you suggesting?

The specific case of _datetime is better resolved by adding the missing test coverage to the multi-phase init migration PR and seeing if it fails (and assuming it fails as expected, using the PyInterpreterState workaround to eliminate the failure).

You mean per-interpreter heaptypes for _datetime to eliminate the failure, right?
If so, do you have a regression case which cannot follow the current convention (consensus)?

Answer 14 · 2024-05-21T15:54:40.000Z

@ncoghlan, what do you think of introducing the a new (internal, for now) API, as discussed, and then adopt _datetime to use it? There should be sufficient time before the 3.14 alpha to get it in.

Answer 15 · 2024-05-21T17:15:19.000Z

@erlend-aasland Can you explain how the discussion resolves _datetime isolation? If it is really effective, why didn't you either propose nor apply it before the 3.12 beta?

Answer 16 · 2024-05-21T17:19:01.000Z

[...] why didn't you either propose nor apply it before the 3.12 beta?

I work on CPython on my spare time; I'm not paid to work on CPython.

Answer 17 · 2024-05-21T17:23:47.000Z

Then, why didn't you suggest it on your spare time?

Answer 18 · 2024-05-21T17:24:49.000Z

I don't think this is a productive line of conversation; let's drop it.

Answer 19 · 2024-05-21T18:10:28.000Z

_datetime needs rationales if per-interpreter heaptypes are required.

Answer 20 · 2024-05-21T18:14:57.000Z

_datetime needs rationales if per-interpreter heaptypes are required.

If you contact me in private, I propose to discuss it with you.

Answer 21 · 2024-05-21T18:49:30.000Z

If you contact me in private, I propose to discuss it with you.

I'm here to follow SC because you don't look in good condition now to discuss it.

Answer 22 · 2024-05-22T18:15:15.000Z

There are more appropriate avenues for resolving this issue than asking for the SC to make a decision. Those include opening a new issue on the CPython tracker or commenting on an existing relevant issue, engaging in a discussion on discuss.python.org, or reaching out the the C API working group. In all such forums, please keep in mind that the Python Code of Conduct applies to all workspaces in order to keep the conversations respectful and productive.

Footnotes