erictraut/cpython

Runtime implementation of creating TypeVars

JelleZijlstra opened this issue · 3 comments

When this code is executed at runtime:

def f[T](x: T): pass

We should do something like the following:

  • Create a new TypeVar named T
  • Execute the function's annotations in such a way that they have access to T. (And possibly something even more complicated with PEP 649 enabled.)
  • Create a new function object with its __type_variables__ set correctly.

The scoping part seems quite complex, but I want to focus here on the part about creating TypeVars and attaching them to the function.

Here's one way to do it:

  • Introduce a new opcode MAKE_TYPEVAR. It takes a name, which should be at the top of the stack. It takes an integer oparg:
    • 0 means create a TypeVar with no bound/constraints
    • 1 means create a ParamSpec
    • 2 means create a TypeVarTuple
    • 4 means create a TypeVar with a bound/constraint. If so, this bound/constraint must be at the top of the stack, below the name.
  • This opcode then calls one of _Py_make_typevar, _Py_make_paramspec, _Py_make_typevartuple from #3, and puts the resulting TypeVar-like on the stack.
  • Introduce a new opcode MAKE_GENERIC_FUNCTION that works like the current MAKE_FUNCTION, except it also consumes a tuple of typevars off the stack, which will be put in the function's __type_variables__.

So with that, the bytecode for:

def f[T, U: int, *V, **W](): pass

would look something like:

LOAD_CONST "T"
MAKE_TYPEVAR 0
LOAD_GLOBAL int
LOAD_CONST "U"
MAKE_TYPEVAR 4
POP_TOP # drop int
LOAD_CONST "V"
MAKE_TYPEVAR 1
LOAD_CONST "W"
MAKE_TYPEVAR 2
BUILD_TUPLE 4
LOAD_CONST <code object>
MAKE_ASYNC_FUNCTION 0
STORE_NAME f

But maybe some other way works better with the current interpreter. Note for example that classes are created at runtime by calling the secret builtin __build_class__. Maybe we should have a __build_typevar__? It feels like that would be less efficient.

My prototype implemented an opcode stream similar to what Jelle posted above. However, the spec was modified after that to allow later TypeVars to refer to earlier TypeVars in their bounds, constraints, and default expressions (the latter assumes that PEP 696 is accepted).

class Foo[A, B:list[A], C = B]: ...

Note that the use of a type variable within a bounds or constrains expression is considered a type checker error currently, but we wanted to preserve that ability for the future if HKTs were added to the type system. Plus, this is needed for default expressions as defined in PEP 696.

To accommodate this, I was thinking that the implementation would first create a tuple of None objects, one for each type parameter. Then it would fill in the tuple one at a time (left to right). This allows expressions evaluated during later TypeVar construction to refer to earlier TypeVars in the list.

Looking at the implementation I found a simpler solution than I proposed above: we can use the intrinsics mechanism that was added to 3.12 (https://docs.python.org/3.12/library/dis.html#opcode-CALL_INTRINSIC_1). My prototype now adds three unary intrinsics for creating a TypeVar, a ParamSpec, and a TypeVarTuple, plus a binary intrinsic for creating a TypeVar with a bound.

This approach will become less attractive if PEP 696 is implemented, since we'll need a ternary intrinsic for TypeVars with both a bound and a default, and another handful of intrinsics for other combinations of features.

As for @erictraut's comment above, my prototype currently handles this scenario simply by setting all names as locals in the type param evaluation function:

>>> class X[A, B: A]: tpl = (A, B)
... 
>>> X.tpl
(A, B)
>>> X.tpl[1].__bound__
A

The current implementation works well here.