haskell/primitive

Support unaligned GC array accesses in `Prim`

Opened this issue · 8 comments

GHC exposes a set of (read|write|index)Word8ArrayAs[ty]# primops for safe unaligned accesses to GC pointers. I think the Prim class could sensibly expose these.

Also note that similar primops for Addr# non-GC pointers will be coming in GHC 9.10 .

A different approach is taken at https://hackage.haskell.org/package/primitive-unaligned-0.1.1.2/docs/Data-Primitive-ByteArray-Unaligned.html

In primitive-unaligned, the typeclass for unaligned access is a totally different typeclass. Adding it to Prim itself is a breaking change for all user-defined instances downstream. It's worth considering though because its not that hard to update any hand-written instances to support this. Do you know which GHC release introduced support for the unaligned primops? We need to consider primitive's GHC support window for something like this.

Regarding unaligned ByteArray primops, I can find writeWord8ArrayAsWord32# going all the way back to base-4.12.0.0, which is GHC 8.6.1.

Thanks for the primitive-unaligned link. That's effectively what I'm doing in my own code, though I need unaligned Addr# access primops instead (which I fake for now). I believe all this sensibly belongs in primitive (eventually, no particular rush). With that in mind, should I keep this issue open to track the merge?

On a related note, would you take a primitive-unaligned update for unaligned Addr# accesses once GHC 9.10 is out?
(And a note related to user-defined Prim instances, what about a wrapper that defines explicit endianness on supported types? Messy code here. Does this belong in/around primitive?)

What's interesting about your type that makes endianness explicit in bytezap is that it's the same exact way I approached the problem in the byte-order library in System.ByteOrder. You named it ByteOrdered and I named it Fixed, and we've got different names for a bunch of other things, but the interfaces are nearly identical. The fact that we independently invented the same interface suggests that there is something fundamental about it. It's hard to say whether or not it belongs in primitive. Personally, I think it seems within the scope of primitive. Additionally, such a consolidation improves discoverability and builds trust that the implementation is correct, which prevents people from having to reinvent this all the time (I doubt we're the only two people who have written this interface). But there are maintainers whose opinions on this I value, and this would need to be discussed in a separate issue.

We might be able to drop support for GHC < 8.6 soon. Currently we support all the way back to GHC 8.0. I'll open a separate thread discussing when older GHCs can be dropped.

I'll take any PRs to primitive-unaligned even before GHC 9.10 is properly released. If you're playing around with a release candidate of it and you're able to incorporate the unaligned Addr# primops into it, go ahead and PR it. The hardest thing about it will be figuring out how to handle the GHC < 9.10 case. I think they can actually be shimmed by reading one 8-bit word at a time and assembling those into a larger word, but that shim code is a little tedious to write.

Fab! Thanks for letting me know about byte-order. Agreed that Fixed needs a discussion on precisely how/where to implement it. I strongly believe these additions would be great for discoverability/"feature completion" and would gladly assist where I can.

I might have a think about safely emulating unaligned Addr# accesses for GHC < 9.10 . bytestring should have some relevant code in its builder. I remember store has some weird stuff too.

bytestring shims unaligned writes using C memcpys. We could maybe do the same, but would have to convert IO (due to FFI) to ST, which I'd need some help with proving safety. Do we require a pure Haskell version?

Using the FFI for the shim isn't necessary. For reading/indexing, see Data.Bytes.Parser.BigEndian:word64 for an example of how to do this:

word64 :: e -> Parser e s Word64
word64 e = uneffectful $ \chunk ->
  if length chunk >= 8
    then
      let wa = PM.indexByteArray (array chunk) (offset chunk) :: Word8
          wb = PM.indexByteArray (array chunk) (offset chunk + 1) :: Word8
          wc = PM.indexByteArray (array chunk) (offset chunk + 2) :: Word8
          wd = PM.indexByteArray (array chunk) (offset chunk + 3) :: Word8
          we = PM.indexByteArray (array chunk) (offset chunk + 4) :: Word8
          wf = PM.indexByteArray (array chunk) (offset chunk + 5) :: Word8
          wg = PM.indexByteArray (array chunk) (offset chunk + 6) :: Word8
          wh = PM.indexByteArray (array chunk) (offset chunk + 7) :: Word8
       in Success
            ( unsafeShiftL (fromIntegral wa) 56
                .|. unsafeShiftL (fromIntegral wb) 48
                .|. unsafeShiftL (fromIntegral wc) 40
                .|. unsafeShiftL (fromIntegral wd) 32
                .|. unsafeShiftL (fromIntegral we) 24
                .|. unsafeShiftL (fromIntegral wf) 16
                .|. unsafeShiftL (fromIntegral wg) 8
                .|. fromIntegral wh
            )
            (offset chunk + 8)
            (length chunk - 8)
    else Failure e

And for the other direction, writing, see Data.Bytes.Builder.Bounded:word64BE:

word64BE :: Word64 -> Builder 8
word64BE w = Unsafe.construct $ \arr off -> do
  writeByteArray arr (off) (fromIntegral @Word64 @Word8 (unsafeShiftR w 56))
  writeByteArray arr (off + 1) (fromIntegral @Word64 @Word8 (unsafeShiftR w 48))
  writeByteArray arr (off + 2) (fromIntegral @Word64 @Word8 (unsafeShiftR w 40))
  writeByteArray arr (off + 3) (fromIntegral @Word64 @Word8 (unsafeShiftR w 32))
  writeByteArray arr (off + 4) (fromIntegral @Word64 @Word8 (unsafeShiftR w 24))
  writeByteArray arr (off + 5) (fromIntegral @Word64 @Word8 (unsafeShiftR w 16))
  writeByteArray arr (off + 6) (fromIntegral @Word64 @Word8 (unsafeShiftR w 8))
  writeByteArray arr (off + 7) (fromIntegral @Word64 @Word8 w)
  pure (off + 8)

Admittedly, that's for ByteArray# and not Addr#, but it illustrates the algorithm.

Thanks and understood, shame because it's got poor performance but it is a compat layer. I'll start a PR on prim-unaligned later :)