
Adding SIMD support

I'm been going through the Streams code trying to figure out how to add SIMD support. For example this SIMD enhanced fold performs very well compared to an inlined version of the core library fold.

   static member inline SIMDFold folder combiner (start:'T) (values : 'T[]) =        
        let mutable i = 0;
        let mutable v = Vector<'T>(start)
        while i < values.Length - Vector<'T>.Count do            
            v <- folder v (Vector<'T>(values,i))
            i <- i + Vector<'T>.Count
        i <- 0
        let mutable result = start        
        while i < Vector<'T>.Count do
            result <- combiner result v.[i]
            i <- i+1

Adding support to streams has a few considerations:

  • I think it would only make sense to allow it with Arrays and maybe ResizeArrays
  • The elements in the array have to be valid SIMD types
  • I'd love for it to be available in parallel and non parallel streams
  • Can it be added to Streams and ParStreams in some way? Or should there be a separate SIMDStream and ParSIMDStream?
  • Can we handle a mix of scalar and vector operations on the stream? (example below)

Ideally, say we had an Array of floats - values
We would want to be able to do something like

|> SIMDStream.simdMap (fun e -> e*e)  //operations on Vector<float>s
|> (fun e -> if (e < 5) then 0 else 3)  //scalar operations on floats
|> SIMDStream.simdSum

This would people could mix and match SIMD operations with scalar ones as sometimes has to be done.

I'm going through the streams code trying to see how this could be done, and it isn't entirely clear. Some parts of the composed function would need to operator on a Vector while others would need to iterate Vector.Count times to operate on individual elements of the array?

If there is interest in this I'd love to contribute but need some guidance as I don't fully understand how the streams work yet.

Hi Jack,
It is certainly exciting to have vectorized streams, but with the current design I don't think that it is possible. Something like array |>Stream.ofArray |>Stream.filter (fun ...) |> Stream.simdfold is definitely problematic. Plus we need to have perfect stream fusion or else the virtual calls will dominate perf wise. One possible direction for vectorized loops is to have a new Streams library targeting perfect stream fusion and then just compile with .net native and hope that the C++ backend will vectorize our loop.

I see. It is too bad that RyuJIT doesn't do any automatic vectorizing, it would be quite nice to get the same kind of auto vectorizing that C compilers do, but at runtime so it can target the available instruction sets.