vrtbl/passerine

Variable Hoisting

slightknack opened this issue · 0 comments

Currently variables are only available after they are defined, as Passerine has no late-bound 'global scope' so to speak. However this makes things like mutual recursion a little annoying:

-- TODO: remove this explicit hoist
odd = ()

even = n -> if (n == 0.0) {
    true
} else {
    odd (n - 1.0)
}

odd = n -> if (n == 0.0) {
    false
} else {
    even (n - 1.0)
}

print even 35.0

We have to declare odd before using it in even. Ideally this is something that can be resolved at compile time. I've started work on this in the dev branch by adding a step that resolves the scope of all locals / captured before compilation, but there are still a few kinks to work out.

For those interested, here's how the system works:

Each time a lambda is introduced, a new scope is entered:

-- outer scope
x = true

capture_function = () -> {
    y = 3.14
    -- inner scope
    print x
    print y
}

if a variable (i.e. x) is defined in the outer scope, it's accessible in the inner scope. This works by 'capturing' x when capture_function is defined (basically just making it a shared reference on the heap).

While compiling, we can imagine this as a chain of scopes. Just after the expression print y, the scope stack looks like this:

[ 
    Scope { locals: ["x"], nonlocals: [] }, 
    Scope { locals: ["y"], nonlocals: ["x"] },
]

Currently, whenever a variable is encountered, here's what currently happens:

  1. If the variable is local to the scope, add it to locals in the current Scope
  2. If the variable is not local to the scope, search backwards through all enclosing scopes. If it is found and add it to the nonlocals of all the scopes we've searched through
  3. If the variable can not be found:
    a. If this is a declaration, i.e. x = true or x -> {} define the local in the current scope
    b. If this is an access, i.e print x, raise the error 'variable referenced before assignment'

This works well in most cases: it's able to hoist captured variables, declare new variables, etc. etc.

However, there are a few problems with this methodology. Consider this simple example:

recurse = x -> recurse x

At the point in time where the expression is compiling recurse (), the scope stack looks like:

[
    Scope { locals: [], nonlocals: [] }, // This is empty!
    Scope { locals: ["x"], nonlocals: [] }, // no reference to `recurse`
]

So this code will throw the error 'variable recurse referenced before assignment'.

How do we fix this? We do something a bit more complicated. Steps 1 - 3a are the same: let's focus on 3b:

b. If this is an access, i.e print x, raise the error 'variable referenced before assignment'

x, for instance, can be defined after we parse the expression. consider the recurse example, or the first example with mutual recursion. What we need to do is somehow mark x as used, and when x is later defined, join the later assignment.

Let me introduce the solution: Hoisting chains! 😱 🎉

So, here's the skinny, 3b becomes:

b. If this is an access, i.e print x, add x to a table called unresolved_captures, and mark x as captured a la rule 2 in all enclosing scopes.

This second bit, 'mark x as captured' builds up what I like to refer to as a 'hoist chain'. We need one more change to make this work, 3a becomes:

a. If this is a declaration, i.e. x = true or x -> {} check unresolved_captures for x.
i. If x is in unresolved captures, define it in the local scope, remove it from unresolved_captures and then remove it as captured in all enclosing scopes (a la rule 2).

An finally, we need an additional rule, rule 4, for dealing with cases where variables are actually undefined:

  1. After compilation, if there are still variables in unresolved_captures, raise the error 'variable(s) referenced before assignment' (obviously point out which variables and where this occurred).

What does this do in practice? consider the following code:

flubber = () -> {
    foo = () -> {
        print (pi + e)
    }
    pi = 3.14
    foo ()
}
e = 2.72

flubber ()

Note that we have two variables that are 'referenced before assignment': e and pi.

After we reach pi in the expression print (pi + e), the scope stack looks like this:

[
    Scope { locals: [], nonlocals: ["pi"] }, 
    Scope { locals: [], nonlocals: ["pi"] },
    Scope { locals: [], nonlocals: ["pi"] },
]

Note how we built a hoisting chain of pis. pi has also been added to unresolved_captures. We repeat this when we reach e.

When we reach pi = 3.14 in the enclosing scope, we see that pi is in unresolved_captures. We then remove it from unresolved_captures and from being captured in all enclosing scopes:

[
    Scope { locals: [], nonlocals: ["e"] }, 
    Scope { locals: ["foo", "pi"], nonlocals: ["e"] },
    // this innermost scope has been popped at this point,
    // but it is correct as we hoisted `pi` and `e` in it
]

Likewise, when we reach e = 2.72:

[
    Scope { locals: ["flubber", "e"], nonlocals: [] }, 
    // scope has also been popped, but stil correct
]

These scopes are packaged into the AST and then used during bytecode emmision to resolve variables.

It's a tad bit more complicated than this, because we're also renaming variables as we do this, but more on that later. Anyway, this feature should be merged in 0.9.X or 0.10.0 depending.