jstolarek/inferno

Value restriction

Closed this issue · 8 comments

I'm not yet sure whether we'll be implementing the value restriction, but I need to record some thoughts I had on this topic. I found F9 example to be particularly interesting:

let f = revapp ~id in f poly

In the system without value restriction f has a quantified type ∀ b. ((∀ a. a → a) → b) → b and we generate following System F term (omitting definitions of id, poly, and revapp):

let f = Λ2. revapp [2] [∀ [3]. 3 → 3] id in f [Int × Bool] poly

I believe that in a system with value restriction we should infer the type of f to be ((∀ a. a → a) → 'b) → 'b, where 'b is weakly polymorphic (not quantified and monomorphic). Then, when typechecking the body of let we would unify type of argument of f with poly, resulting in type of f being ((∀ a. a → a) → Int × Bool) → Int × Bool and the following System F term:

let f = revapp [Int × Bool] [∀ [3]. 3 → 3] id in f poly

Trick question: what if we had the following term in a system with value restriction:

let f = revapp ~id in f

This would result in following System F term:

let f = revapp ['2] [∀ [3]. 3 → 3] id in f

where 2 is an unbound variable. Inferno's System F typechecker currently does not allow unbound type variables so this is incorrect. A simple modification would be to permit free monomorphic variables.

Trick question: what if we had the following term in a system with value restriction:
...

Doesn't the same problem already appear in \lambda x.x? FreezeML wouldn't generalize this.

However, for this term, FreezeML(X) currently generates the following System F term, which is well-typed, but not a truthful representation of the original program: Λ2247. λ(x : 2247). x

This is a normal issue with ML-like systems when the value restriction is added (as discussed I think when we added FreezeML with value restriction to Links). The definition of Standard ML talks about this and says that it is up to implementations what to do here, e.g.

  • OCaml (and Links) treat the unbound variables as persistent unification variables and so type inference for later code can unify them, specializing the types further.
  • Standard ML (I think) just complains with an error
  • Some other implementations choose a random ground (mono) type to instantiate with, e.g. unit.
  • I think another sensible thing to do would be instantiate to an "empty" type, similar to how in Scala when things go wrong the compiler assumes that something has type Any, and so you get an error cascade.

Doesn't the same problem already appear in \lambda x.x?

I think not. It is a GVal, so it would get generalized even with the value restriction. But you do have a point here - we could just write revapp ~id and it would have the same problem. Recall that in FreezeML(X) when we write an expression λx.x it is implicitly let-bound (let _ = λx.x in CTrue) and generalized to avoid unbound type variables. This is only for the purpose of generating a closed System F term and is very arbitrary.

@jamescheney OK, it's good to know that this is a well-known problem rather than some inferno-specific quirk.

Recall that in FreezeML(X) when we write an expression λx.x it is implicitly let-bound (let _ = λx.x in CTrue) and generalized to avoid unbound type variables. This is only for the purpose of generating a closed System F term and is very arbitrary.

While this is useful from a practical perspective (Links does the same by default), I would like to avoid this in FreezeML(X), by using one of the strategies suggested by James. After all, it's a hack for the problem of how to deal with under-specified types, which we face again once we add the value restriction. And once we consider such spurious type variables resulting from the value restriction, generalizing those at the toplevel, too, seems morally wrong (but I suspect that it wouldn't make the type system unsound).

I would personally be in favor of just using unbound type variables for this, which we would have to collect and return as a set of type variables that we assume to be part of some initial type variable environment. But even though the constraints-with-values technique implemented in inferno should allow piping these type variables up to the top, it does seem a bit cumbersome. In particular, if the constraints-with-values aspect of FreezeML(X) becomes part of the formal definition rather than just the implementation, I wouldn't want to do this, as it's not worth the extra complications of having to explain this in our paper.

So I also wouldn't be opposed to just instantiating all these type variables with unit and calling it a day.

I think the primary reason why inferno does what it does is because typechecking a closed System F term is a good correctness check. Bugs in the typechecker can lead to generating ill-typed System F terms and these have to be detected in one way or another.

Yes, I like type-checking the System F terms as an extra safety net. So unless we do a blanket generalization at the toplevel, there will always be a need to collect the type variables that were neither generalized nor instantiated, in order to distinguish them from spurious type variables resulting from type-checker bugs.

@jamescheney points out that we might already have everything that we need to implement the value restriction entirely on the client side. It suffices to treat non-generalising let as a syntactic sugar for lambda abstraction + application, but not generate the isMono constraint (for the unannotated case). I think this should really be fairly simple to do.

Value restriction is now implemented and merged into the main branch. Apparently, inferno already does the correct thing when it comes to binding free unification variables so no problems there. I've added some tests but as always more testing is welcome.