goldfirere/singletons

singletons has nondeterministic type variable orderings

RyanGlScott opened this issue · 4 comments

It occurred to me recently that the sorts of nondeterminism issues that plagued GHC for years also plague singletons. While GHC has fixed these issues, singletons has not. The following example illustrates the problem well:

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
module Bug where

import Data.Singletons
import Data.Singletons.TH (singletonsOnly)

$(singletonsOnly [d|
  const :: a -> b -> a
  const x _ = x
  |])

When a user writes sConst @Bool, which type gets instantiated to Bool: a or b? You might think "well obviously the answer is a", and a quick smoke-test in GHCi would appear to confirm that hunch:

$ /opt/ghc/8.6.1/bin/ghci Bug.hs -XTypeApplications
GHCi, version 8.6.1: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
[1 of 1] Compiling Bug              ( Bug.hs, interpreted )
Ok, one module loaded.
λ> :type +v sConst @Bool
sConst @Bool
  :: forall b (t1 :: Bool) (t2 :: b). Sing t1 -> Sing t2 -> Sing t1

However, that's only because of a lucky fluke. As it turns out, if you pass -dunique-increment=-1 to GHC, then you'll get a different answer:

$ /opt/ghc/8.6.1/bin/ghci Bug.hs -XTypeApplications -dunique-increment=-1
GHCi, version 8.6.1: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
[1 of 1] Compiling Bug              ( Bug.hs, interpreted )
Ok, one module loaded.
λ> :type +v sConst @Bool
sConst @Bool
  :: forall a (t1 :: a) (t2 :: Bool). Sing t1 -> Sing t2 -> Sing t1

Ack. It turns out that the order of the kind variables a and b in sConst's type signature completely depend on which unique values happen to be gensymmed for them (by newName) in singletons' Template Haskell machinery. This is because several functions in th-desugar and singletons collect type variables using Set Name, and the Ord Name instance compares by Name's unique values first and foremost. If a happens to have a smaller unique than b, then when you call Set.toList on a set containing {a, b}, then you'll get [a,b]. If b has a smaller unique than a, on the other hand, then you'll get [b,a].

Essentially, any place in singletons where call S.toList on a Set Name (that can affect the user-facing API) should be treated as nondeterministic code. I count at least three places in th-desugar and singletons (each!) where we do this at the time of writing.

Luckily, GHC has figured out how to solve this problem, so we should be able to learn from its mistakes. One of the main tricks that GHC uses to avoid this sort of nondeterminism is by switching out uses of Set for a deterministic, insert-ordered set, such as UniqDSet. It seems likely that we'll need to find such a data structure on Hackage (or conjure up one ourselves) and use it throughout th-desugar and singletons where appropriate.

Oof. Good point.

I'm not convinced we need to go with something like UniqDSet though. Let's just use a list! We're generally talking about very small numbers of elements, and performance of these libraries is not critical (as it's all at compile time). Indeed, given the small size of the collections involved, switching to lists might be a performance improvement in practice!

We're generally talking about very small numbers of elements, and performance of these libraries is not critical

I want to agree with you here, but something doesn't quite sit right with me about this plan. I think the thing that bugs me is the fact that we'd need to change the type of fvDType (from th-desugar) from Set Name to [Name]. In order to accomplish that, we'd need to ensure the list of Names has no duplicates—in other words, we'd need to nub the final result (which has quadratic time). There are several places in singletons where foldMap fvDType is a common idiom, which mean the time complexity of those parts of the code can now be cubic!

I could (reluctantly) get on board with a quadratic-time algorithm, but going cubic or higher generally indicates to me that whatever approach I'm using is not going to scale at all. It's true that pretty much no test case in singletons would be affected that much, but I can almost guarantee that someone's singletons code in the wild would blow up as a result of this.

I actually don't think it would be that hard to whip up an insert-ordered Set, so I'd be inclined to at least give that a try before settling for lists.

(as it's all at compile time).

True, but that doesn't mean people aren't sensitive to increasing compile times. Take this issue, for example :)

OK, if you want to cook up the fancy data structure, I won't stop you.

As for foldMap fvDType: that's easy to fix. Just have a variant of fvDType that doesn't nub. Fold the non-nubbing one, and then nub at the end. We're back in the relative comfort of quadratic time.

It turns out that there is a Hackage library that's perfect for our needs: ordered-containers, which contains implementations of insertion-ordered Maps and Sets (called OMaps and OSets in that package). I briefly sketched out an implementation of th-desugar/singletons based on ordered-containers, but ran into some issues that I had to work around:

  1. There's no union equivalent for OMaps. (I opened an issue about this here.)
  2. There's no Data instance for OSet. (I opened an issue about this here.)
  3. There's no Functor instance for OMap. (I opened an issue about this here.)
  4. There's no Semigroup or Monoid instances for OSet. (Issue here.)

Once the issues above become resolved, we can move forward with this. Alternatively, if we don't want to wait, we could always inline the implementations of OMap and OSet directly into th-desugar, since they're not that much code, and we could replace it with ordered-containers once a suitable version appears with the features that we need.