josefs/Gradualizer

Keep track of variable bindings & types

japhib opened this issue · 5 comments

This project is great, but sometimes the type issues are difficult to debug. Aside from ongoing efforts to improve the error messages, it would be very helpful if Gradualizer could provide the type that it infers for any particular variable. (I guess just in --infer mode.)

For now it would be quite useful if you could provide a flag and it would just print out all bound variables found in the file it checks. (Or return it as part of the env). But in the future this would pave the way for being able to create a Gradualizer IDE plugin where you could hover over a variable and it would show you in real-time what the variable's type is inferred to be. Here's an example of what it looks like for Typescript in VSCode:

image

I hope to take a stab at implementing this myself, although I'm not sure what the best approach is. Of course the types of variables are already inferred, I believe by typechecker:add_var_binds/3 -- but these types are not surfaced back up to the original caller. Basically I think in the overall env we'd just want to maintain a list of variables that have been found, and what their types are inferred to be, as well as location information like line/column.

If I/we wanted to implement this by passing around a list of bound variables in the env (Approach A), it looks like there would first need to be a pretty massive refactor around type_check_forms/type_check_form_with_timeout, to replace the current heavy usage of throw to get errors out of the typechecker. All the uses of throw would need to be replaced with adding to a list of errors that is passed around everywhere, perhaps inside an env.

One alternate approach would be to store the list of variables in some stateful way, such as in an ETS or GenServer (Approach B), as we traverse the AST. But this approach would miss any variables that come after an error has been thrown, interrupting the analysis of a function.

Maybe the best approach (Approach C) would be to completely separate the process of getting bound variables & their types out of the typechecking process, and have an entirely different entry point & iteration mechanism. This could probably be pretty simple, since it could probably heavily take advantage of existing type-inference code. The plumbing needed in order to traverse all types of AST nodes is probably not that bad either.

As I have time, I'll continue to investigate Approach C and report back on any progress I'm able to make.

Hi, @japhib!

[...] it would be very helpful if Gradualizer could provide the type that it infers for any particular variable.
For now it would be quite useful if you could provide a flag and it would just print out all bound variables found in the file it checks.

Indeed, it would be quite useful. It shouldn't be too hard to implement using the --verbose CLI flag and the corresponding ?verbose macro for the case when an analysis over a file returns no errors. If errors are found it gets more tricky as return via a throw prevents us from getting the #env.venv.

If I/we wanted to implement this by passing around a list of bound variables in the env, [...]

It's already done like this. Without knowing types of variables Gradualizer wouldn't be able to operate and therefore carry out its analysis. That's the #env.venv field.

to replace the current heavy usage of throw to get errors out of the typechecker.

Alternatively, we could store the var binds in an ETS, so that they outlive a return-via-throw. The ETS would still be accessible after that. This would likely be the smallest code change overall, at the price of using global state.

Maybe the best approach (Approach C) would be to completely separate the process of getting bound variables & their types out of the typechecking process, and have an entirely different entry point & iteration mechanism.

I think it's not as easy as it seems if we want to do it in a bidirectional typechecker. The idea of a bidirectional system is that it alternates between checking and synthesis (aka inference) when traversing the AST, depending on the node and already available type information. This means the variable types we're looking for are built up as we go through the AST.

What does typechecking mean, though? It means traversing the AST, getting type info and checking for type conflicts. So "just getting variable types" and typechecking are quite close (if not the same thing) in a bidirectional checker. The first three sections of Bidirectional Typing helped me grok this concept and see the forest instead of just the trees.

It doesn't mean that just getting type information is not possible, but doing it accurately would likely require rewriting the AST traversal (which is already there to do typechecking) according to the same type system check/infer rules, with less operations done (no typechecking) in each respective AST node (aka expression). I think that might be quite a bit of work :|


To sum up, we can easily print variable types when the analysis finds no errors. With an ETS for the variable env we could also print them if it returns errors. So would be the case with a refactoring of throws or throwing the #env.venv, too. I'm not sure which way to go is the best yet, as each seems to require quite some effort.

Thanks for the response! The ETS approach makes sense. I'm not sure when I'll have more time to take a shot at this, but I'd love to spend more time on it in the future.

@japhib what do you think of #529 as a partial solution? It's the low hanging fruit achievable without revolutionary refactorings, which should aid in the case when knowing a variable type is beneficial. Alas, it's not suitable for integration from an editor/IDE.

That would be helpful! I'd still like to leave this issue open until a more editor-friendly solution is reached, but that seems like a good stopgap solution.