Zaid-Ajaj/Fable.Remoting

Poor binary deserialization performance in web client

juselius opened this issue · 11 comments

The following code gives wildly different performance when run in the browser or using the .NET client:

type IData = {
    getData : Guid -> Async<int array * single array>
}

let server =
    Remoting.createApi "https://data.myserver.io"
    |> Remoting.withBinarySerialization
    |> Remoting.buildProxy<IData>

My use case has 950 000 + 330 000 elements, ~60 MB in total. The timings are:

  • Server-side lookup + serialization + transfer over the wire: 1500 ms
  • Deserialization using the Remoting.DotnetClient: 600 ms
  • Deserialization using Remotin.Client in the browser: 17200 ms!

I wouldn't be surprised if the browser was a bit slower than .NET, but not a factor of 30.

The Fable (de)serialization did not undergo any form of optimizations like the .NET bits, so I'm sure there are plenty of low hanging fruits to be reaped there. On the other hand, your use case boils down to creating 2 arrays and writing over a million numbers into them and I don't know if there is anything to speed up there.

member x.ReadRawArray (len: int, elementType: Type) =
#if !FABLE_COMPILER
let arr = Array.CreateInstance (elementType, len)
for i in 0 .. len - 1 do
arr.SetValue (x.Read elementType, i)
arr
#else
let arr = Array.zeroCreate len
for i in 0 .. len - 1 do
arr.[i] <- x.Read elementType
arr
#endif

You have to remember that JS is interpreted, so it won't be just "a bit slower" even when running nearly identical code. (Now that I think about it, perhaps there is some JIT compilation step in the browser? I don't know.)

Can you post the code you're using to measure the client deserialization time? It would help me in investigating further.

@kerams in the browser I'm just wrapping the API calls in JS.console.time / timeEnd and subtracting the transfer times obtained from the Network Monitor in the browser. On .NET I'm just using System.Diagnostics.StopWatch.

I suspect it's the x.Read call which is causing the slowdown, since it has to do a lot of pattern matching for every element. Perhaps one should "invert" the order for arrays: First look up the type, and the read with a fixed expression. Ugly, but optimizations usually are...

@kerams I'l flip the order of things, and PR if it helps.

First look up the type, and the read with a fixed expression. Ugly, but optimizations usually are...

It's not so simple. MessagePack arrays are not typed, so in theory every element can be of a different type. That of course is not an issue for us, but the individual Read calls are still necessary because int32 can be represented as 1, 3 or 5 bytes each, so you cannot take a shortcut and read the array in 4 byte chunks. It should be possible to use a reduced pattern match for number arrays where we know the element type beforehand, but I'm a bit skeptical as to how much it would help. It's worth trying nonetheless.

Any PR where you show a performance improvement while retaining correctness is hugely appreciated.

@kerams I'll see what I can make of it. Good point about correctness, it's easy to forget when you have been relying on a proper type system for 10+ years...

I have managed to squeeze down the timings from 18900 ms to 4900 ms for raw Int32 and Float32 arrays (so far), while at the same time keeping correctness. The code is not very pretty, but that's the price unfortunately. I will add a few more primitive types and PR.

I think this will do for now, although it should be possible to shave off another 1000 ms or more by bypassing MessagePack altogether. Perhaps I will create a Remoting.PrimitiveBinary package at some point.

Actually the best solution is probably to send everything as bin32, and do the conversion explicitly in the client. That way endianness can also be handled server side, giving additional speed ups.

I have created a minimal PR (#284) which speeds up deserialization by a factor 3-4. The main problem was not pattern matching or function calls, but garbage collection.

It's still possible to gain a factor of 1.5-2, but that requires a lot of manual unrolling and inlining, and makes the code really long, hard to change and ugly. IMHO it's not worth it.

For reference, if you really need maximum performance, convert your data manually to a byte array, and convert back in the client. Using this technique I managed to take the original deserialization down from 17 000 ms to 1600 ms.

Actually the best solution is probably to send everything as bin32, and do the conversion explicitly in the client

That also means integers in your array take 4 bytes, even if you're only concerned about small numbers in your application. In other words, you could be transferring up to 4 times as much data (0-127 are encoded as one byte in MessagePack), though (de)serialization might be faster.

@kerams you are absolutely right. I think the first approach should be using the standard serialization/deserialization, and only if that isn't cutting it, one should go for bin32. In my case I'm transferring geometry coordinates and indices for quite large meshes (> 3M triangles), which means that 99% of the numbers will large than 2^7 and 2^16 anyway.

@juselius Should be resolved as of latest Fable.Remoting.Client v7.17 can you please take it for a spin? I'll close the issue. Please re-open if the issue persists