nats-io/nats.net.v2

Ordered consumer performance

mtmk opened this issue · 5 comments

mtmk commented

Ordered consumer performance is our gold standard benchmark for consumers. Test and improve ordered consumer performance. This might inline with planned efforts to improve receive buffer performance.

Prep

> nats stream create x
> nats bench x --js --stream x --pub 1 --purge --msgs 10000000
> nats stream ls

Program.cs

using System.Diagnostics;
using NATS.Client.Core;
using NATS.Client.JetStream;

Console.WriteLine("start");

await using var nats = new NatsConnection();
var js = new NatsJSContext(nats);

var consumer = await js.CreateOrderedConsumerAsync("x");

var count = 0;
double size = 0;
var stopwatch = Stopwatch.StartNew();
await foreach (var msg in consumer.ConsumeAsync<NatsMemoryOwner<byte>>(opts: new NatsJSConsumeOpts
               {
                   MaxMsgs = 5_000,
               }))
{
    using var memory = msg.Data;
    size += memory.Length;
    if (++count == 10_000_000)
        break;
}
stopwatch.Stop();

Console.WriteLine($"{count/stopwatch.Elapsed.TotalSeconds:n0} msgs/sec" +
                  $" ~ {size/stopwatch.Elapsed.Seconds/(1024.0*1024.0):n2} MB/sec");

Console.WriteLine("bye");

Compare to nats bench

> nats bench x --js --stream x --sub 1 --msgs 10000000
14:32:58 JetStream ephemeral ordered push consumer mode, subscribers will not acknowledge the consumption of messages
14:32:58 Starting JetStream benchmark [subject=x,  multisubject=false, multisubjectmax=100000, js=true, msgs=10,000,000, msgsize=128 B, pubs=0, subs=1, stream=x, maxbytes=1.0 GiB, syncpub=false, pubbatch=100, jstimeout=30s, pull=false, consumerbatch=100, push=false, consumername=natscli-bench, purge=false, pubsleep=0s, subsleep=0s, deduplication=false, dedupwindow=2m0s]
14:32:58 Starting subscriber, expecting 10,000,000 messages
Finished     11s [========================================================================] 100%

Sub stats: 846,452 msgs/sec ~ 103.33 MB/sec
> dotnet run -c release
start
701,490 msgs/sec ~ 87.19 MB/sec
bye

How much is BoundedChannel still in the Ordered consumer path?

In general (which is worth noting, since while Ordered consumer is the 'standard', perf is cross-cut), BoundedChannnel is noticeably worse for reads (has a Lock on parent which also impacts writes) and a tiny bit worse for writes (although, amplified due to the larger shared lock on reads and writes).

We may want to consider trying to find (or writing) a Channel implementation that takes the properties we need from UnboundedChannel (minimal locking and resize-copying) and BoundedChannel (bounded capacity) and provides the best performance/ease.