erictraut/cpython

Implement TypeVar, TypeVarTuple, ParamSpec, Generic in C

JelleZijlstra opened this issue · 12 comments

The PEP says:

Several classes in the typing module that are currently implemented in Python must be reimplemented in C. This includes: TypeVar, TypeVarTuple, ParamSpec, Generic, and Union.

Thinking of how to implement this:

  • Add a new file Objects/typevarobject.c holding type definitions for TypeVar, TypeVarTuple, and ParamSpec (in one file since they'll be relatively simple and closely similar)
  • These should for the most part behave like the current typing.py implementations
  • Add an internal API like _PyTypeVar_New(const char *name, PyObject *bound, PyObject *constraints) that the runtime can use to construct a TypeVar.
  • Change typing.py to expose the C versions. (Possibly just by creating a dummy TypeVar using PEP 695 syntax and saving the type, similar to some of the tricks in types.py.)

I'm not sure Generic needs to be reimplemented exactly. It seems to me that instead some of Generic's behavior needs to be moved to the base type implementation. Probably we can leverage GenericAlias here.

I'm pretty sure the mention of Union in the PEP is a mistake. The PEP does not really change how Unions are treated, and in any case we already have a C Union type thanks to PEP 604.

Those classes have a lot of functionality. This feels like a large project. We had to do the same thing for PEP 585 (e.g. list[int]) and again for PEP 604 (spelling unions as X|Y), and it was a lot of work, and the first versions weren't very good. I think Serhiy rewrote most of it eventually.

I wonder if we could create a prototype (like Eric's original prototype) that just imports the existing classes from typing.py. That would allow us to write unit tests, so we can test the complicated scoping rules and the basic syntax. IIRC the requirement to reimplement everything in C mostly came from bootstrapping considerations, plus the general ugliness of having C code (in particular in the compiler stages) depending on Python code, which requires invoking the compiler recursively.

I got basic C versions of these classes working (code in #3), but getting them fully equivalent to the Python version would require re-implementing almost all of typing.py in C, a nightmarish project. For example, TypeVar bounds are passed through typing._type_check, which references Protocol, Never, Final, and almost every other typing primitive. There are similar complexities in the methods that implement type substitution.

Could we get away with having only the classes themselves in C, but importing the Python typing module to implement checks like _type_check?

But _type_check can be called when TypeVar is instantiated, right? Just not for simple cases.

Since it is only a check, maybe we can just skip it? We skipped such checks (and others) for list[C] even though they were present in the Python version. E.g. IIRC list[42] is accepted, though typing.List[42] isn’t.

It's not only a check, it also turns strings into ForwardRefs (and None into NoneType, but that's easy enough to do in C):

>>> TypeVar("x", bound="x").__bound__
ForwardRef('x')

Agree that we can get away with simplifying some of the behavior as we migrate to C, but it's better to keep the behavior the same as much as possible.

Here's a partial list of behaviors that are difficult to implement in C:

  • _type_check (which is called on bounds and constraints) turns strings into typing.ForwardRef objects.
  • _type_check rejects various typing primitives
  • TypeVarTuple.__iter__ returns an Unpack object
  • ParamSpec.__typing_subst__ checks for Concatenate
  • ParamSpec and TypeVarTuple both have __typing_prepare_subst__ methods that look fairly complicated

Argh, I guess this is another case of pulling one thread and the whole sweater unravels. :-(

I'm not sure what to do. Does the PEP promise that everything is implemented in C?

I note that there are other places where the C code relies on Python code. Examples are copyreg, warnings and runpy.

This makes me feel at least somewhat comfortable with punting on a full C implementation of TypeVar. For example, I'd be fine with deferring to the Python versions of TypeVarTuple, ParamSpec and ForwardRef, at least in an initial implementation.

One reason to be lenient here is that it will be years before people are going to be able to use the new syntax in production code -- probably not before the last Python version that doesn't support it has reached EOL. If we get a basic working version into 3.12, people can start that migration a year sooner. Performance or resilience against extreme environmental conditions (e.g. the filesystem from which typing.py would be loaded disappearing) seems like things we could add in future versions. (Though some guidance from the SC might be helpful -- @emilyemorehouse?)

Personally, my ideal scenario with the current constraints would be to punt on the C rewrite if we could maintain compatibility with existing functionality (also without too much of a performance hit if that’s a concern with adding places that the compiler needs to call Python code; I’m not familiar enough with the typing implementation to know). The PEP can always be amended; the C rewrite was not important in the decision to approve the PEP. I’m generally fine with doing what we need to (again, without major behavior changes) to ensure that we can land this in 3.12 – I agree that would be a big win!

In my prototype, I emitted opcodes that simply imported these symbols from typing every time they were needed. That felt really hacky and inelegant. It bloats the opcode stream, causes a bunch of extra work at runtime, and potentially creates strange error conditions that would be difficult to debug. For example, what if the import fails because they have a local file called typing.py? How will the error be reported in such a case?

It sounds like you're saying that it might be acceptable to use this hack for the first version of the implementation. I'm fine with that if you are.

My current version in #3 implements TypeVar etc. themselves in C, but delegates to the Python code for a number of operations that would be difficult to implement in C. That might be an acceptable compromise, since it means that just defining a simple TypeVar won't trigger imports of typing.py. (Things that would trigger an import include defining a TypeVar with a bound, which needs typing._type_check, and performing generic substitution on a generic class that is generic over a TypeVar.)

I am okay with this for 3.12. (Thanks to Emily's clarification.)

I would like to know how the error is reported if typing.py can't be found, or if it doesn't define a certain object, or if it defines an object of an incompatible type.

I would also like to know if there are now two TypeVar implementations, or if typing.py will re-export the one implemented in C. (Where does it find it? Presumably in types.py?)

I am okay with this for 3.12. (Thanks to Emily's clarification.)

I would like to know how the error is reported if typing.py can't be found, or if it doesn't define a certain object, or if it defines an object of an incompatible type.

Something like this:

>>> TypeVar("T", bound=int)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'typing'

And similarly you'd get an AttributeError if typing._type_check doesn't exist, or a TypeError if it's not the right kind of callable.

I would also like to know if there are now two TypeVar implementations, or if typing.py will re-export the one implemented in C. (Where does it find it? Presumably in types.py?)

We should have only one implementation. We can do something like

class _Dummy[T]:
    _typevar = T

TypeVar = type(_Dummy._typevar)

to get the TypeVar class to expose in typing.py.

All sounds good!

This is working fine in the prototype.