Ordered consumer performance

Question

Ordered consumer performance

mtmk opened this issue 8 months ago · 5 comments

Ordered consumer performance is our gold standard benchmark for consumers. Test and improve ordered consumer performance. This might inline with planned efforts to improve receive buffer performance.

Prep

> nats stream create x
> nats bench x --js --stream x --pub 1 --purge --msgs 10000000
> nats stream ls

Program.cs

using System.Diagnostics;
using NATS.Client.Core;
using NATS.Client.JetStream;

Console.WriteLine("start");

await using var nats = new NatsConnection();
var js = new NatsJSContext(nats);

var consumer = await js.CreateOrderedConsumerAsync("x");

var count = 0;
double size = 0;
var stopwatch = Stopwatch.StartNew();
await foreach (var msg in consumer.ConsumeAsync<NatsMemoryOwner<byte>>(opts: new NatsJSConsumeOpts
               {
                   MaxMsgs = 5_000,
               }))
{
    using var memory = msg.Data;
    size += memory.Length;
    if (++count == 10_000_000)
        break;
}
stopwatch.Stop();

Console.WriteLine($"{count/stopwatch.Elapsed.TotalSeconds:n0} msgs/sec" +
                  $" ~ {size/stopwatch.Elapsed.Seconds/(1024.0*1024.0):n2} MB/sec");

Console.WriteLine("bye");

Compare to nats bench

> nats bench x --js --stream x --sub 1 --msgs 10000000
14:32:58 JetStream ephemeral ordered push consumer mode, subscribers will not acknowledge the consumption of messages
14:32:58 Starting JetStream benchmark [subject=x,  multisubject=false, multisubjectmax=100000, js=true, msgs=10,000,000, msgsize=128 B, pubs=0, subs=1, stream=x, maxbytes=1.0 GiB, syncpub=false, pubbatch=100, jstimeout=30s, pull=false, consumerbatch=100, push=false, consumername=natscli-bench, purge=false, pubsleep=0s, subsleep=0s, deduplication=false, dedupwindow=2m0s]
14:32:58 Starting subscriber, expecting 10,000,000 messages
Finished     11s [========================================================================] 100%

Sub stats: 846,452 msgs/sec ~ 103.33 MB/sec

> dotnet run -c release
start
701,490 msgs/sec ~ 87.19 MB/sec
bye

Answer 1 · 2024-02-13T18:51:47.000Z

How much is BoundedChannel still in the Ordered consumer path?

In general (which is worth noting, since while Ordered consumer is the 'standard', perf is cross-cut), BoundedChannnel is noticeably worse for reads (has a Lock on parent which also impacts writes) and a tiny bit worse for writes (although, amplified due to the larger shared lock on reads and writes).

We may want to consider trying to find (or writing) a Channel implementation that takes the properties we need from UnboundedChannel (minimal locking and resize-copying) and BoundedChannel (bounded capacity) and provides the best performance/ease.