Implementation of new scope rules

Question

Implementation of new scope rules

JelleZijlstra opened this issue 2 years ago · 7 comments

The scope rules in the PEP (https://peps.python.org/pep-0695/#type-parameter-scopes) feel like one of the most complicated parts of the PEP to implement, maybe especially because those are areas of the interpreter that I'm not familiar with. Maybe others have already thought about this (I think an earlier prototype implemented a variation of the scoping rules already), but here are my thoughts on how to do it.

The closest analog for how TypeVars should work in the new system is nonlocals/cell variables. I believe we should be able to leverage the existing cellvar/freevar mechanism to implement this PEP.

The compiler/symtable.c would have to change so that it generates a cell variable every time it encounters a TypeVar declaration, and generates a LOAD_DEREF every time it encounters a reference to that name in a syntactically nested scope. I don't know enough about the symtable to opine on how exactly this would work, but presumably we'd have to keep some sort of mapping of names to active TypeVars as we walk the code.

Let's use this issue to track implementing the necessary symtable/compiler changes.

Answer 1 · 2023-04-18T16:21:13.000Z

I tried several approaches in my prototype including one that's similar to what you're proposing. That approach didn't work well for me, but it's possible that I didn't understand the nuances of cell variables at the time. The problem is that we don't want to create a new formal scope, so we're still using the symbol table for the existing scope. We can't generate any "real" variables because they would overwrite variables of the same name within that scope, which violates the spec.

The approach I took in my prototype was to track a set of "overlay" symbols for a scope. When a name references one of these overlays, it uses the overlay symbol rather than the underlying symbol in the symbol table. A symbol table in an inner scope can override the name of an outer symbol's overlay.

Answer 2 · 2023-04-18T16:25:55.000Z

I spent a little time trying to implement this but didn't get far; I think this is the first time I'm trying to work with symtable.c.

One idea I had was to make a new scope (as used by symtable.c) for the typevars defined by a generic class/function/alias, and then use that scope only for the typevars. The compiler would then need some special handling to generate the right cellvar/freevar code for these names.

I am planning to spend most of my time at the PyCon sprints early next week focusing on getting this to work. Hopefully I'll be able to pull in other core devs who know more about the compiler and symtable.

Answer 3 · 2023-04-18T16:35:57.000Z

Generating a new scope is very problematic. I made several attempts using variants of this approach, and it breaks a bunch of assumptions in the existing code. Maybe you can come up with some insight that eluded me, but my sense is that this approach won't work.

Answer 4 · 2023-04-21T20:23:21.000Z

I went through a few iterations over the last few days:

Creating a new scope is hard because the scope would be so different from other kinds of scopes.
I tried an approach with "overlays" added to the scope, which store the type param names. I got stuck trying to get cell variables work correctly with this, because cell variables want to live in code objects, and I didn't have a good code object to put them in.
So I tried to put all the typeparams in the enclosing code object (e.g., the module). This required mangling the name, because type param names may conflict with other names in the same scope. But it was hard to get the mangling to apply correctly in all nested scopes.
I went back to new scopes, and realized the problem with that is that scopes are normally tied to code objects. So I decided to add a new code object that is responsible for evaluating the type parameters. This approach appears to work:

>>> def f[T](x: T = 1): print(x, T)
... 
>>> f.__annotations__
{'x': 'T'}
>>> f()
1 T
>>> T
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'T' is not defined

There's a lot of things still to do (actually creating TypeVars at runtime; generic classes; type aliases; fixing the qualname of the function), but this approach seems very promising. It requires only very small changes to symtable.c.

Answer 5 · 2023-04-21T20:47:41.000Z

Ah, that's a great insight. I didn't think of adding a new code object.

One challenge with this approach is that code within this code object needs to be able to access variables one scope above it, even if those variables are in a class scope. Normally scopes within a class scope cannot access the class' variables. There's no facility (currently) to create cell variables within a class scope.

class Foo:
    class Bar: ...

    def func[T: Bar](self): ... # `Bar` needs to be accessible when evaluating the type param bound

Answer 6 · 2023-04-22T13:31:52.000Z

That's a good point. My first instinct on how to handle it was to change the symtable logic so that typeparam scopes somehow have access to the variables defined in a directly enclosing class scope, and put these names in cells at runtime. This approach would be quite complicated to implement in symtable.c, and I don't think it would be able to handle this case:

x = 1
class F:
    if random.random() > 0.5:
        x = 2
    def method[T](self, param: x): pass

Here we want x to be resolved either from the class namespace or the global namespace, and we can only figure out which at runtime.

So I am now thinking of an alternative: We replace LOAD_GLOBAL instructions in type parameter blocks that are lexically within class blocks with a new instruction, LOAD_CLASS_OR_GLOBAL. This instruction first checks the class's namespace dict, and if it doesn't find the name there, it falls back to regular LOAD_GLOBAL. To get the class namespace, we add a parameter to the type param block that holds the namespace.

The pseudocode for LOAD_CLASS_OR_GLOBAL would be something like:

class_ns = LOAD_FAST(0)
if name in class_ns:
    return class_ns[name]
else:
    return LOAD_GLOBAL(name)

cc @carljm who I was badgering with this problem last night.

Answer 7 · 2023-04-22T18:39:56.000Z

I successfully implemented the approach above.