jstolarek/inferno

0 rank used when signature provided on a let binding

jstolarek opened this issue · 5 comments

Consider this program:

let (x : (∀ a. a → a) → Int) = λ(f : ∀ a. a → a). f 1 in 1

Annotation on x is represented internally as a variable with the following structure:

{id=6, structure=({id=4, structure=forall [{id=2, rank=-1}].
        {id=3, structure=({id=2, rank=-1} -> {id=2, rank=-1}), rank=-1}, rank=0} ->
    {id=1, structure=Int, rank=0}), rank=0}

Since x isn't registered all ranks are 0, except for the generic variables in the quantified type, which are -1. When typechecking let binding variable representing the annotated type of x is not registered since we only register the variable used for inference. (Moreover, even if it was being registered we currently don't traverse a variable recursively, assuming that each registered variable is a fresh unification variable whose structure needs to be discovered. That was true in an ML setting without type annotations.) We then infer the type for the body of a bound term:

{id=7, structure=({id=10, structure=forall [{id=8, rank=-1, skolem}]. 
        {id=9, structure=({id=8, rank=-1, skolem} -> {id=8, rank=-1, skolem}), rank=-1}, rank=1} ->
    {id=17, structure=Int, rank=1}), rank=1}

The inferred ranks are 1. When we unify the inferred type with annotation we get:

{id=6, structure=({id=4, structure=forall [{id=8, rank=-1, skolem}].
        {id=3, structure=({id=8, rank=-1, skolem} -> {id=8, rank=-1, skolem}), rank=-1}, rank=0} ->
    {id=1, structure=Int, rank=0}), rank=0}

Since unification picks a lower rank, the ranks of variables 3 and 6 remain at 0, which is incorrect and causes an assertion failures later on. Interestingly, the assertion is triggered by variable 1 (representing Int) because x is not used in the let body.

Another example that fails because of this is E3 from the PLDI paper:

let r : (∀ a. a → (∀ b. b → b)) → Int = λx.1 in r (λx.λy.y)

(This is ill-typed and should be rejected since x needs an annotation.) This triggers an assertion in Generalize.instantiate because we're running into an unregistered (rank 0) variable.

Today I explored a hacky solution for the problem. The idea is to introduce a distinguished negative rank, -2, that indicates that a type variable comes from a type signature, in the same way distinguished rank -1 indicates generic variables. Then, when we unify this distinguished rank with any other rank instead of taking a minimum we take the rank of the other variable. This allows to typecheck the expression originally reported in this issue (and also E3 example no longer triggers an assertion), but it also causes assertion failure in nested_forall_inst test:

λ(x : (∀ a. a → a) → (∀ a. a → a)). (x ~id)@ 1

That assertion failure indicates a deeper problem in the treatment of type signatures. Here's what's going on. Lambda abstraction generates a CDef constraint. When solving it we add x to the environment and assign it with a type scheme given by the signature:

{id=7, structure=(
      {id=6, structure=forall [{id=4, rank=-1}].
         {id=5, structure=({id=4, rank=-1} -> {id=4, rank=-1}), rank=-1}, rank=-2} ->
      {id=3, structure=forall [{id=1, rank=-1}].
         {id=2, structure=({id=1, rank=-1} -> {id=1, rank=-1}), rank=-1}, rank=-2}),
   rank=1}

At this point type variables with rank -2 are in the environment. When typechecking the body we instantiate type of x and the -2 variables are turned into rank 1 variables. The final inferred type of the whole expression is:

{id=0, structure=({id=7, structure=(
      {id=24, structure=forall [{id=22, rank=-1}].
         {id=23, structure=({id=22, rank=-1} -> {id=22, rank=-1}), rank=-1}, rank=1} -> 
      {id=3, structure=forall [{id=1, rank=-1}].
         {id=2, structure=({id=1, rank=-1} -> {id=1, rank=-1}), rank=-1}, rank=1}), rank=1} ->
    {id=26, structure=Int, rank=1}), rank=1}

The type is correct. The ranks are also correct. The problem is that variables 24 and 3 have rank 1 (correctly) but are unknown to the generalization engine because they have not been registered but rather intoduced from a type signature. (Set Debug.enabled and Debug.hard to see pool of registered variables in debug output.) This is essentially the same problem as reported orginally - ranks 0 pointed to unregistered variables.

I'm not yet entirely sure how the case of annotated lambdas differs from annotated lets. With annotated lets we have:

  1. a type variable representing the signature
  2. a proxy type variable representing an inferred type

and the hack allows to set the ranks in the signature to that of the inferred type. With annotated lambdas we don't use a proxy variable. That's one difference that might cause annotated lets to work but annotated lambdas to fail. Another one is that the let case just doesn't check that assertion. So, here's a task for myself for next week: write a program that uses an annotated let and trips the same assertion as annotated lambda. If I'm right then this will be possible by writing a program that adds an annotated let to the environment and then uses it (frozen?) in another let binding.

Also, what is a correct solution here? Registering variables from signature seems like an obvious answer, but for annotated let expressions it could be a problem because the signature variables are redundant (due to presence of proxies). If, on the other hand, I apply the solution only to annotated lambdas then this is a somewhat worrying asymetry.

So, here's a task for myself for next week: write a program that uses an annotated let and trips the same assertion as annotated lambda.

Now that I thought more about it I think that's impossible. When I say λ(x : ∀ a. a → a). M then x is in scope in M, but when I say let x : ∀ a. a → a = M in N then x is not in scope in M, but only in N by which time the type signature has already been fixed with the hack and there's no way to trip the assertion.

Registering variables from signature seems like an obvious answer

This needs to be done carefully. Consider expression λx.x. When we generate constraints for it, i.e. call [| λx.x : w |] we get:

∃x1.∃x2. w = x1 → x2 ∧ def x : x1 in [| x : x2 |]

Unification variables x1 and x2 get registered when the solver enters their existentials so attempting to register x1 again when processing def is incorrect. So what I do is traverse structure of a variable in a def and register only those variables that are marked as coming from a signature. This seems fragile and I won't be surprised if it breaks but for now it seems to work for every example that I have.

I'm still thinking about way to do this better. I had an idea that it should suffice to recursively register unregistered (rank 0) variables in CDef and CLet (before attempting to unify signature with the inferred type). That almost works, but there's a problem with signatures without quantifiers. Concretely:

let id_int : Int → Int = λx.x in id_int 1

In this program when we register signature variables they receive rank 1:

{id=6, structure=({id=2, structure=Int, rank=1} -> {id=1, structure=Int, rank=1}), rank=1}

and the body of the type inferred for the bound term is:

{id=7, structure=({id=8, rank=-1} -> {id=8, rank=-1}), rank=-1}

This unifies and results in:

{id=6, structure=({id=1, structure=Int, rank=-1} -> {id=1, structure=Int, rank=-1}), rank=-1}

where the -1 ranks are incorrect. That's probably fixable with some sort of hack, but I want a principled way of doing this without hacks.

The signature hack was removed in e2c08b7 and a proper fix was implemented. A proper fix means recursively registering unregistered (rank 0) variables in CDef and CLet (before unifying the signature) constraints.