travitch/persistent-vector

Remove indirections

Opened this issue · 14 comments

This is kind of a big challenge. The data constructors for the internal nodes shouldn't actually be necessary. We should be able to get rid of them using primitive-unlifted, or a local copy thereof. I have a general sense of the shape of things, but it's all rather tricky.

The first idea is that we should have unlifted arrays of unlifted arrays ... of arrays of elements. The type of that whole thing presumably has to be calculated by a type family from the depth. How do we know that depth at the type level? We fake singletons, and store them as the shift fields. Something like that, anyway.

Actually, I don't think we need to track tree height in the types. We could simulate something very much like what we have now using enough unsafe coercions. But I'd prefer to track it, for two reasons:

  1. The most important reason, probably: we'll be able to use types to help enforce the height-balance invariant.
  2. As GHC improves, we'll be able to reduce the number of unsafe coercions we need. I imagine that within a year or two, we may only need them at the root node (for reasons relating to type roles), and maybe we'll eventually be able to get rid of them altogether.

Question: why do we have an array of internal nodes in the root node, and bl not just a single one? Right now, that saves us an indirection, but if it's not essential to the structure it can be dropped once that's no longer true.

Is the question: why is intVecPtrs Array (Vector_ a) instead of Vector_ a? I don't think there is a fundamental reason. It would probably just change some index calculations. It might make the implementation of indexing more regular, but that isn't a design goal if it is faster or more space efficient to not have it that way.

Yes, that's the question. I don't see any obvious reason for it to be significantly faster (aside from the current indirection issue), and more regular code may also be smaller, which can improve performance all by itself.

The performance upsides

  1. Indexing and updating functions will have to follow roughly half as many pointers. This sounds very good, but we'll need benchmarks to see how big a difference it makes.

  2. The structure will be a little more compact. This probably doesn't make a whole lot of difference in and of itself, because the arrays will mostly be large relative to the boxing overhead.

The performance downsides, as far as I know

  1. In current GHC, we have to choose between

    a. Sticking with something Array-like rather than switching to SmallArray, or
    b. Accepting that the (unboxed) array read/write/index operations will have to be NOINLINE for safety reasons.

    It's pretty easy to bury this choice in its own module and let benchmarks decide which is better. Once BoxedRep happens (hopefully soon; GHC Gitlab MR 4249 is the beginning, and hopefully the rest won't take too long), this problem will go away.

  2. To keep things reasonably manageable, I expect there to be some performance regressions in GHC versions that don't support UnliftedNewtypes. Personally, I usually just ignore library performance under non-bleeding-edge GHC versions, but this is your library, so it's ultimately your decision.

  3. The documented "strict spine" claim will become utterly true for the tree portion of the structure. That's almost always a good thing, but it sucks for fmap and for traverse in "lazy" Applicatives. The advantage of a lazy spine in those contexts is that the tree can be built out as its leaves are demanded, rather than needing to be produced all at once with thunks in its leaves. Fortunately, performance of those methods doesn't really seem to be a primary goal for the package.

I was looking at that GHC feature - it sounds exciting. Would that ultimately enable us to write data structure that can (levity) polymorphically hold either boxed or unboxed data? That would be amazing. I'd ditch all backwards compatibility for that.

Would that ultimately enable us to write data structure that can (levity) polymorphically hold either boxed or unboxed data?

Not really. @andrewthad hopes to eventually support levity-polymorphic bindings (for lifted and unlifted pointers), but there's no proposal for that. @ekmett has managed to use backpack to share a lot of code across unboxed things, and maybe between boxed and unboxed. But boxed and unboxed things really have different calling conventions, so true polymorphism is not likely.

I'm making good progress on this, but it's quite a slog and most of the steps don't seem very exciting all by themselves.

  • Use the type system to enforce the height balance invariant.
  • Add a validity test to validate the rest of the data structure invariants.
  • Store a Vector_ in the root node instead of an array of Vector_s. This isn't strictly necessary or strictly relevant, but it seems to simplify a lot of code, which will be important when everything is overhauled and things get more complicated elsewhere.
  • Finish simplifications relating to getting rid of the array in the root node. This is almost done; snoc is all that remains.
  • Stuff the shift into the Vector_ constructors, preparing for a change in Vector_ representation.
  • Put together the unlifted array operations we need. This is really just wrapping a (modified) version of part of primitive-unlifted.
  • Re-express Vector_ as a pair of a singleton level indicator (already introduced) and a structure of unlifted arrays. Use pattern synonyms to fake the GADT we had before.
  • Replace real existential quantification with nasty unsafe faked-up existential quantification using Any and various flavors of unsafeCoerce. This is currently necessary because GHC does not yet know how to unpack existentials, and does not allow role annotations for type or data families. This is quite easy; it's just fugly.

I'm making progress, but I think I may have taken a wrong turn. Things are getting messy. I think I may have erred when I got rid of the extra array at the top. While that was weird and unnecessary from the standpoint of the data structure, it made the root of the tree unconditionally an internal node, which seems pretty helpful from an unboxing standpoint. Namely, it lets us write something like !(UnliftedArray (TreeType level a)) to unpack the array tree into the root node. Otherwise, we get non-uniform stuff or other complications. Hrmph. My current branch is the "other complications" version, which I think can be made efficient, but is pretty awful to work on.

So ... somewhat back to the drawing board now. I'm really not surprised. I fully expected this to be a long and iterative process. On the plus side, I've fixed a lot of other problems in the process, and I've gotten a sense of what needs to happen and (probably) how.

Hmmm .... Thinking more about it, I don't think that going to an extra array is either necessary or sufficient. The trouble is that, using Array and UnliftedArray, we can't extract an element from an UnliftedArray without first knowing if that element is an Array or an UnliftedArray. My unpleasant workaround uses a polymorphic

data Box (a :: TYPE 'UnliftedRep) = Box# a

This can hold an Array# or an UnliftedArray# just fine in the same constructor. I just don't know how to make programming with that not suck.

As for why wrap at all and not just use the unlifted types directly: the main reason is that Monad demands lifted results, so all the ST stuff must be lifted. Manually passing State# tokens around through deeply nested case expressions is just awful, and I refuse.

That issue (the requirement of the value returned by bind to be boxed) is what basically tanked my Haskell implementation of a SAT solver.

Well, I guess I'll see how far I can go with Box. If I can give indexing performance a big boost, it might be worth the unpleasant code. If the improvement proves small, it's probably not worthwhile.