Variable Hoisting
slightknack opened this issue · 0 comments
Currently variables are only available after they are defined, as Passerine has no late-bound 'global scope' so to speak. However this makes things like mutual recursion a little annoying:
-- TODO: remove this explicit hoist
odd = ()
even = n -> if (n == 0.0) {
true
} else {
odd (n - 1.0)
}
odd = n -> if (n == 0.0) {
false
} else {
even (n - 1.0)
}
print even 35.0
We have to declare odd before using it in even. Ideally this is something that can be resolved at compile time. I've started work on this in the dev
branch by adding a step that resolves the scope of all locals / captured before compilation, but there are still a few kinks to work out.
For those interested, here's how the system works:
Each time a lambda is introduced, a new scope is entered:
-- outer scope
x = true
capture_function = () -> {
y = 3.14
-- inner scope
print x
print y
}
if a variable (i.e. x
) is defined in the outer scope, it's accessible in the inner scope. This works by 'capturing' x
when capture_function
is defined (basically just making it a shared reference on the heap).
While compiling, we can imagine this as a chain of scopes. Just after the expression print y
, the scope stack looks like this:
[
Scope { locals: ["x"], nonlocals: [] },
Scope { locals: ["y"], nonlocals: ["x"] },
]
Currently, whenever a variable is encountered, here's what currently happens:
- If the variable is local to the scope, add it to
locals
in the currentScope
- If the variable is not local to the scope, search backwards through all enclosing scopes. If it is found and add it to the
nonlocals
of all the scopes we've searched through - If the variable can not be found:
a. If this is a declaration, i.e.x = true
orx -> {}
define the local in the current scope
b. If this is an access, i.eprint x
, raise the error 'variable referenced before assignment'
This works well in most cases: it's able to hoist captured variables, declare new variables, etc. etc.
However, there are a few problems with this methodology. Consider this simple example:
recurse = x -> recurse x
At the point in time where the expression is compiling recurse ()
, the scope stack looks like:
[
Scope { locals: [], nonlocals: [] }, // This is empty!
Scope { locals: ["x"], nonlocals: [] }, // no reference to `recurse`
]
So this code will throw the error 'variable recurse referenced before assignment'.
How do we fix this? We do something a bit more complicated. Steps 1 - 3a are the same: let's focus on 3b:
b. If this is an access, i.e
print x
, raise the error 'variable referenced before assignment'
x
, for instance, can be defined after we parse the expression. consider the recurse
example, or the first example with mutual recursion. What we need to do is somehow mark x
as used, and when x
is later defined, join the later assignment.
Let me introduce the solution: Hoisting chains! 😱 🎉
So, here's the skinny, 3b becomes:
b. If this is an access, i.e
print x
, addx
to a table calledunresolved_captures
, and markx
as captured a la rule 2 in all enclosing scopes.
This second bit, 'mark x
as captured' builds up what I like to refer to as a 'hoist chain'. We need one more change to make this work, 3a becomes:
a. If this is a declaration, i.e.
x = true
orx -> {}
checkunresolved_captures
forx
.
i. Ifx
is in unresolved captures, define it in the local scope, remove it fromunresolved_captures
and then remove it as captured in all enclosing scopes (a la rule 2).
An finally, we need an additional rule, rule 4, for dealing with cases where variables are actually undefined:
- After compilation, if there are still variables in
unresolved_captures
, raise the error 'variable(s) referenced before assignment' (obviously point out which variables and where this occurred).
What does this do in practice? consider the following code:
flubber = () -> {
foo = () -> {
print (pi + e)
}
pi = 3.14
foo ()
}
e = 2.72
flubber ()
Note that we have two variables that are 'referenced before assignment': e
and pi
.
After we reach pi
in the expression print (pi + e)
, the scope stack looks like this:
[
Scope { locals: [], nonlocals: ["pi"] },
Scope { locals: [], nonlocals: ["pi"] },
Scope { locals: [], nonlocals: ["pi"] },
]
Note how we built a hoisting chain of pi
s. pi
has also been added to unresolved_captures
. We repeat this when we reach e
.
When we reach pi = 3.14
in the enclosing scope, we see that pi
is in unresolved_captures
. We then remove it from unresolved_captures
and from being captured in all enclosing scopes:
[
Scope { locals: [], nonlocals: ["e"] },
Scope { locals: ["foo", "pi"], nonlocals: ["e"] },
// this innermost scope has been popped at this point,
// but it is correct as we hoisted `pi` and `e` in it
]
Likewise, when we reach e = 2.72
:
[
Scope { locals: ["flubber", "e"], nonlocals: [] },
// scope has also been popped, but stil correct
]
These scopes are packaged into the AST
and then used during bytecode emmision to resolve variables.
It's a tad bit more complicated than this, because we're also renaming variables as we do this, but more on that later. Anyway, this feature should be merged in 0.9.X or 0.10.0 depending.