haskell/error-messages

(rigid, skolem) type variables escaping scope

Opened this issue · 3 comments

This is an example of technical jargon in error messages which doesn't mean much to anyone but the most experienced Haskellers. Here is a simple example to reproduce this error:

module Foo where

h :: (b -> ()) -> Int
h = error "urk"

f = h (\x -> let g :: a -> Int
                 g y = length [x, y]
              in ())

The issue here is that x and y must have the same type because they are both elements of the same list, and y has type a which is bound in the type signature of g, but g is bound inside the lambda that binds the variable x, so the information about a would have to travel from inside the lambda to the outside, which is not allowed.

The error message that is produced with GHC 9.0.2 is:


Foo.hs:7:35: error:
    • Couldn't match expected type ‘b0’ with actual type ‘a’
        because type variable ‘a’ would escape its scope
      This (rigid, skolem) type variable is bound by
        the type signature for:
          g :: forall a. a -> Int
        at Foo.hs:6:18-30
    • In the expression: y
      In the first argument of ‘length’, namely ‘[x, y]’
      In the expression: length [x, y]
    • Relevant bindings include
        y :: a (bound at Foo.hs:7:20)
        g :: a -> Int (bound at Foo.hs:7:18)
        x :: b0 (bound at Foo.hs:6:9)
  |
7 |                  g y = length [x, y]
  |                                   ^

In my opinion there are a few strange things in this error message.

  1. The message mentions type variables escaping a "scope", but I have not enabled ScopedTypeVariables so which scope does the message mean?

  2. Rigid and skolem are very technical terms, could that simply be left out without upsetting advanced Haskell users?

  3. The signature for g is written as g :: forall a. a -> Int, but that is not legal syntax because I haven't enabled ExplicitForAll.

  4. Where does the variable b0 come from? Now it is relatively clear because there is only one b type variable in the whole program, but what about more complicated programs?

Related to "rigid" type variables specifically, here is another program that produces an error message containing that word:

module Bar where

f :: a -> a -> ()
f _ _ = ()

g :: a -> b -> ()
g x y = f x y

The produced error message is:

Foo.hs:7:13: error:
    • Couldn't match expected type ‘a’ with actual type ‘b’
      ‘b’ is a rigid type variable bound by
        the type signature for:
          g :: forall a b. a -> b -> ()
        at Foo.hs:6:1-17
      ‘a’ is a rigid type variable bound by
        the type signature for:
          g :: forall a b. a -> b -> ()
        at Foo.hs:6:1-17
    • In the second argument of ‘f’, namely ‘y’
      In the expression: f x y
      In an equation for ‘g’: g x y = f x y
    • Relevant bindings include
        y :: b (bound at Foo.hs:7:5)
        x :: a (bound at Foo.hs:7:3)
        g :: a -> b -> () (bound at Foo.hs:7:1)
  |
7 | g x y = f x y
  |             ^

This makes me wonder if it is possible for type variables in error messages to be anything else than rigid (I don't remember ever seeing the word "wobbly" in error messages)? Or at least it seems that GHC error messages only ever report rigid type variables in those "... bound by" blocks, so is it really necessary to make that distinction and complicate the error messages?

I would propose this error message:

Foo.hs:7:13: error:
    • Couldn't match expected type ‘a’ with actual type ‘b’
    • In the second argument of ‘f’, namely ‘y’
      In the expression: f x y
      In an equation for ‘g’: g x y = f x y
    • Relevant bindings include
        y :: b (bound at Foo.hs:7:5)
        x :: a (bound at Foo.hs:7:3)
        g :: a -> b -> () (bound at Foo.hs:7:1)
  |
7 | g x y = f x y
  |             ^

If scoped type variables (or at least explicit for all) is enabled, then GHC could say more about the actual binding of type variables:

Foo.hs:7:13: error:
    • Couldn't match expected type ‘a’ with actual type ‘b’
      ‘b’ is bound by:
          g :: [forall a b.] a -> b -> () (at Foo.hs:6:1-17)
      ‘a’ is bound by:
          g :: [forall a b.] a -> b -> () (at Foo.hs:6:1-17)
    • In the second argument of ‘f’, namely ‘y’
      In the expression: f x y
      In an equation for ‘g’: g x y = f x y
    • Relevant bindings include
        y :: b (bound at Foo.hs:7:5)
        x :: a (bound at Foo.hs:7:3)
        g :: [forall a b.] a -> b -> () (bound at Foo.hs:7:1)
  |
7 | g x y = f x y
  |             ^

Also, one thing that none of these errors explain is why a and b need to be equal at all. Would it be possible for GHC to say something about how f is instantiated here?

This explanation might help. There is an alternative to rigid variables. GHC internally calls them meta variables, but I think the most standard name for them is unification variables. We should strive to come up with a standard vocabulary and use it consistently across error messages.

One challenge is that type variables have two related notions of scope -- and I don't think this is avoidable. One is their lexical scope: where in the source program can we write a and have each occurrence all refer to the same variable? Another is their type-checking scope (I'm making that term up): where in the program can a term have a type that mentions a and still be valid? In a sense, the type-checking scope in Haskell is a projection of the lexical scope of a variable in Core / System F. But that's a poor way to explain it to Haskellers, who might not know Core.

A lexical scope is never bigger than a type-checking scope.

The messages that you're getting about a type variable escaping its scope are really about a type variable escaping its type-checking scope, not its lexical scope (which, as you suggest, doesn't really make sense without ScopedTypeVariables).

Maybe we just need a word other than "scope" to denote what I've been calling a type-checking scope. Here are a few attempts:

  • region of validity
  • extent
  • reach
  • dominion

I probably like "reach" the most. So we have

Definition. The scope of a type variable a is a region of code where writing a in type-syntax refers to that same a.

Definition. The reach of a type variable a is a region of code where variables may be inferred to have a type mentioning a and polymorphic functions may be instantiated to types mentioning a.

Returning to your original case, repeated here:

h :: (b -> ()) -> Int
h = error "urk"

f = h (\x -> let g :: a -> Int
                 g y = length [x, y]
              in ())

I suggest this new error message

Foo.hs:7:35: error:
    • Couldn't match expected type ‘b0’ with actual type ‘a’
        because type variable ‘a’ would occur beyond its reach
      Unification variable 'b0' is the inferred type of 'x' (bound at Foo.hs:6:9)
      Abstract type variable 'a' is bound by
        the type signature for:
          g :: forall a. a -> Int
        at Foo.hs:6:18-30
      The reach of 'a' includes only the body of 'g',
      yet 'b0' exists outside of this reach.
    • In the expression: y
      In the first argument of ‘length’, namely ‘[x, y]’
      In the expression: length [x, y]
    • Relevant bindings include
        y :: a (bound at Foo.hs:7:20)
        g :: a -> Int (bound at Foo.hs:7:18)
        x :: b0 (bound at Foo.hs:6:9)
  |
7 |                  g y = length [x, y]
  |                                   ^

There's a lot going on there! But maybe it's understandable? And I think this would not be so hard to implement, except for the third line explaining where b0 comes from (which would require GHC to have a new subsystem tracking the birth and propagation of unification variables). On the other hand, the "The reach of ..." line should be doable today.

I was tempted to add a

Pass -fprint-jargon-definitions to have GHC define its terms in error messages

which would add definitions of "unification variable", "abstract type variable", "expected type", "actual type" and "reach" to the end of the error message. But actually I think it would be better to define these terms on a website somewhere. Maybe GHC can print out defined terms in a special color.

With ghc-proposals/ghc-proposals#448, we could pretty print

g @a y = length [x, y]

to show the reach. The message could be something like

<original code>

With a bound explicit in the term:

<elaborated code>

I think that would be very nifty!!