Keep track of variable bindings & types
japhib opened this issue · 5 comments
This project is great, but sometimes the type issues are difficult to debug. Aside from ongoing efforts to improve the error messages, it would be very helpful if Gradualizer could provide the type that it infers for any particular variable. (I guess just in --infer
mode.)
For now it would be quite useful if you could provide a flag and it would just print out all bound variables found in the file it checks. (Or return it as part of the env
). But in the future this would pave the way for being able to create a Gradualizer IDE plugin where you could hover over a variable and it would show you in real-time what the variable's type is inferred to be. Here's an example of what it looks like for Typescript in VSCode:
I hope to take a stab at implementing this myself, although I'm not sure what the best approach is. Of course the types of variables are already inferred, I believe by typechecker:add_var_binds/3
-- but these types are not surfaced back up to the original caller. Basically I think in the overall env
we'd just want to maintain a list of variables that have been found, and what their types are inferred to be, as well as location information like line/column.
If I/we wanted to implement this by passing around a list of bound variables in the env
(Approach A), it looks like there would first need to be a pretty massive refactor around type_check_forms/type_check_form_with_timeout
, to replace the current heavy usage of throw
to get errors out of the typechecker. All the uses of throw
would need to be replaced with adding to a list of errors that is passed around everywhere, perhaps inside an env
.
One alternate approach would be to store the list of variables in some stateful way, such as in an ETS or GenServer (Approach B), as we traverse the AST. But this approach would miss any variables that come after an error has been thrown, interrupting the analysis of a function.
Maybe the best approach (Approach C) would be to completely separate the process of getting bound variables & their types out of the typechecking process, and have an entirely different entry point & iteration mechanism. This could probably be pretty simple, since it could probably heavily take advantage of existing type-inference code. The plumbing needed in order to traverse all types of AST nodes is probably not that bad either.
As I have time, I'll continue to investigate Approach C and report back on any progress I'm able to make.
Hi, @japhib!
[...] it would be very helpful if Gradualizer could provide the type that it infers for any particular variable.
For now it would be quite useful if you could provide a flag and it would just print out all bound variables found in the file it checks.
Indeed, it would be quite useful. It shouldn't be too hard to implement using the --verbose
CLI flag and the corresponding ?verbose
macro for the case when an analysis over a file returns no errors. If errors are found it gets more tricky as return via a throw
prevents us from getting the #env.venv
.
If I/we wanted to implement this by passing around a list of bound variables in the
env
, [...]
It's already done like this. Without knowing types of variables Gradualizer wouldn't be able to operate and therefore carry out its analysis. That's the #env.venv
field.
to replace the current heavy usage of throw to get errors out of the typechecker.
Alternatively, we could store the var binds in an ETS, so that they outlive a return-via-throw
. The ETS would still be accessible after that. This would likely be the smallest code change overall, at the price of using global state.
Maybe the best approach (Approach C) would be to completely separate the process of getting bound variables & their types out of the typechecking process, and have an entirely different entry point & iteration mechanism.
I think it's not as easy as it seems if we want to do it in a bidirectional typechecker. The idea of a bidirectional system is that it alternates between checking and synthesis (aka inference) when traversing the AST, depending on the node and already available type information. This means the variable types we're looking for are built up as we go through the AST.
What does typechecking mean, though? It means traversing the AST, getting type info and checking for type conflicts. So "just getting variable types" and typechecking are quite close (if not the same thing) in a bidirectional checker. The first three sections of Bidirectional Typing helped me grok this concept and see the forest instead of just the trees.
It doesn't mean that just getting type information is not possible, but doing it accurately would likely require rewriting the AST traversal (which is already there to do typechecking) according to the same type system check/infer rules, with less operations done (no typechecking) in each respective AST node (aka expression). I think that might be quite a bit of work :|
To sum up, we can easily print variable types when the analysis finds no errors. With an ETS for the variable env we could also print them if it returns errors. So would be the case with a refactoring of throw
s or throwing the #env.venv
, too. I'm not sure which way to go is the best yet, as each seems to require quite some effort.
Thanks for the response! The ETS approach makes sense. I'm not sure when I'll have more time to take a shot at this, but I'd love to spend more time on it in the future.
That would be helpful! I'd still like to leave this issue open until a more editor-friendly solution is reached, but that seems like a good stopgap solution.