singletons has nondeterministic type variable orderings
RyanGlScott opened this issue · 4 comments
It occurred to me recently that the sorts of nondeterminism issues that plagued GHC for years also plague singletons
. While GHC has fixed these issues, singletons
has not. The following example illustrates the problem well:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
module Bug where
import Data.Singletons
import Data.Singletons.TH (singletonsOnly)
$(singletonsOnly [d|
const :: a -> b -> a
const x _ = x
|])
When a user writes sConst @Bool
, which type gets instantiated to Bool
: a
or b
? You might think "well obviously the answer is a
", and a quick smoke-test in GHCi would appear to confirm that hunch:
$ /opt/ghc/8.6.1/bin/ghci Bug.hs -XTypeApplications
GHCi, version 8.6.1: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
[1 of 1] Compiling Bug ( Bug.hs, interpreted )
Ok, one module loaded.
λ> :type +v sConst @Bool
sConst @Bool
:: forall b (t1 :: Bool) (t2 :: b). Sing t1 -> Sing t2 -> Sing t1
However, that's only because of a lucky fluke. As it turns out, if you pass -dunique-increment=-1
to GHC, then you'll get a different answer:
$ /opt/ghc/8.6.1/bin/ghci Bug.hs -XTypeApplications -dunique-increment=-1
GHCi, version 8.6.1: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
[1 of 1] Compiling Bug ( Bug.hs, interpreted )
Ok, one module loaded.
λ> :type +v sConst @Bool
sConst @Bool
:: forall a (t1 :: a) (t2 :: Bool). Sing t1 -> Sing t2 -> Sing t1
Ack. It turns out that the order of the kind variables a
and b
in sConst
's type signature completely depend on which unique values happen to be gensymmed for them (by newName
) in singletons
' Template Haskell machinery. This is because several functions in th-desugar
and singletons
collect type variables using Set Name
, and the Ord Name
instance compares by Name
's unique values first and foremost. If a
happens to have a smaller unique than b
, then when you call Set.toList
on a set containing {a, b}
, then you'll get [a,b]
. If b
has a smaller unique than a
, on the other hand, then you'll get [b,a]
.
Essentially, any place in singletons
where call S.toList
on a Set Name
(that can affect the user-facing API) should be treated as nondeterministic code. I count at least three places in th-desugar
and singletons
(each!) where we do this at the time of writing.
Luckily, GHC has figured out how to solve this problem, so we should be able to learn from its mistakes. One of the main tricks that GHC uses to avoid this sort of nondeterminism is by switching out uses of Set
for a deterministic, insert-ordered set, such as UniqDSet
. It seems likely that we'll need to find such a data structure on Hackage (or conjure up one ourselves) and use it throughout th-desugar
and singletons
where appropriate.
Oof. Good point.
I'm not convinced we need to go with something like UniqDSet
though. Let's just use a list! We're generally talking about very small numbers of elements, and performance of these libraries is not critical (as it's all at compile time). Indeed, given the small size of the collections involved, switching to lists might be a performance improvement in practice!
We're generally talking about very small numbers of elements, and performance of these libraries is not critical
I want to agree with you here, but something doesn't quite sit right with me about this plan. I think the thing that bugs me is the fact that we'd need to change the type of fvDType
(from th-desugar
) from Set Name
to [Name]
. In order to accomplish that, we'd need to ensure the list of Name
s has no duplicates—in other words, we'd need to nub
the final result (which has quadratic time). There are several places in singletons
where foldMap fvDType
is a common idiom, which mean the time complexity of those parts of the code can now be cubic!
I could (reluctantly) get on board with a quadratic-time algorithm, but going cubic or higher generally indicates to me that whatever approach I'm using is not going to scale at all. It's true that pretty much no test case in singletons
would be affected that much, but I can almost guarantee that someone's singletons
code in the wild would blow up as a result of this.
I actually don't think it would be that hard to whip up an insert-ordered Set
, so I'd be inclined to at least give that a try before settling for lists.
(as it's all at compile time).
True, but that doesn't mean people aren't sensitive to increasing compile times. Take this issue, for example :)
OK, if you want to cook up the fancy data structure, I won't stop you.
As for foldMap fvDType
: that's easy to fix. Just have a variant of fvDType
that doesn't nub
. Fold the non-nubbing one, and then nub
at the end. We're back in the relative comfort of quadratic time.
It turns out that there is a Hackage library that's perfect for our needs: ordered-containers
, which contains implementations of insertion-ordered Map
s and Set
s (called OMap
s and OSet
s in that package). I briefly sketched out an implementation of th-desugar
/singletons
based on ordered-containers
, but ran into some issues that I had to work around:
- There's no
union
equivalent forOMap
s. (I opened an issue about this here.) - There's no
Data
instance forOSet
. (I opened an issue about this here.) - There's no
Functor
instance forOMap
. (I opened an issue about this here.) - There's no
Semigroup
orMonoid
instances forOSet
. (Issue here.)
Once the issues above become resolved, we can move forward with this. Alternatively, if we don't want to wait, we could always inline the implementations of OMap
and OSet
directly into th-desugar
, since they're not that much code, and we could replace it with ordered-containers
once a suitable version appears with the features that we need.