svroonland/zio-kinesis

Performance degradation in Producer

Pete1232 opened this issue · 17 comments

After upgrading from zio-kinesis 0.5.0 -> 0.11.0 we noticed a significant impact in performance when using the Producer class to publish records to Kinesis.

We have observed batching-like behaviour where response time is very high initially and improves (though still isn't great) with higher throughput. Screenshot attached.

image

I've copied in the code to try and find a fix. So far removing .reverse on the lists and swapping foreachPar_ for foreach_ has had no impact (these seemed to be the most significant changes outside of the ZIO version upgrade)

Thanks for reporting and already trying some low-hanging fruit changes.

I will see if I can reproduce the degradation this weekend.

How did you create the graph, using in-code measurements or on the AWS side?

The graph was from gatling frontline, that's response times on a http server during a load test. On request the server makes some queries to upstream services then pushes one or more records to kinesis using the Publisher. In 0.5.0 the response times were consistently under 1 second and we saw the same behaviour today when mocking out the Publisher call so we believe that must be the root of the issue.

I can't however figure out what could cause this, as the Producer hasn't changed significantly outside of API changes for ZIO 1.0.0. I've taken a copy of the class and am trying a few changes to see if we can get any improvement.

Could you tell me about your usage?

  1. What is the rate of records/s (lower and upper range if known)
  2. Are you using produceChunk?

I assume you're only using a single Producer instance?

  1. Going from 0-200 req/sec in a linear way over 10 mins (see screenshot). This will translate roughly to the number of records/sec
  2. Yes, though each chunk will usually only contain a single data record

image

edit: and yes on the single Producer instance

This graph is from setting https://github.com/svroonland/zio-kinesis/blob/master/src/main/scala/nl/vroste/zio/kinesis/client/Producer.scala#L176 to 1. Not as good as before the upgrade but much better than the graph I shared before. I suspect the maxBufferTime is not being honoured. Could be a bug in ZTransducer itself or the way it's being used maybe?

image

edit: when I increased this value I also saw the following error from the client

at 'records' failed to satisfy constraint: Member must have length less than or equal to 500 (Service: Kinesis, Status Code: 400, Request ID: e51e24c3-e341-8d21-b2df-fe9dd48ebfad, Extended Request ID: 

the rate we're sending would never get to 500 records within the configured 500 millis which may to some extent prove the max buffer time isn't being honoured

I'm not very comfortable with the streams API but I switched out Schedule.spaced for Schedule.fixed and this was the result
image

From the documentation fixed sound right for our use-case (we don't want buffering to last long, as we need to respond to clients synchronously). Not sure if this would be desired for the library though

Interesting, that was one of my suspects. So this gives you back your expected performance?

Interesting, that was one of my suspects. So this gives you back your expected performance?

Not quite what it was before yet but much better, I'm trying again with a lower buffer time. I don't think it will have been buffering much if at all with the 0.5 version from the behaviour

Worth bearing in mind that for an async workload where response times don't matter it may have been better with the other schedule. Maybe this should be configurable? Not sure

With the buffer time you can make a trade off between latency and throughput. For a low number of requests/s, you're better off with a lower buffer time, but at some point it's advantageous to wait a while and put up to 500 requests in a single PutRecords call (which is what the Producer does internally).

I do think that v0.5 did this kind of buffering as well. There were quite some changes in ZIO Schedule, so I think the behaviour is slightly different.

I suppose what would be the ideal behavior for your case is that as soon as a record comes in, it gets sent to Kinesis. There's a limit to the maximal number of in-flight requests and only when that limit is reached, you want to start buffering.

That way you get low latency at low requests/s and high throughput at high requests/s.

.. which is exactly what aggregateAsyncWithin is supposed to be doing apparently :)
https://github.com/zio/zio/blob/db9818e1b6c2807ed251066fbbb3416cc8dd6f26/streams/shared/src/main/scala/zio/stream/ZStream.scala#L165

In which case you are right that v0.5 did not do buffering for low requests/s. Then it looks like there's a change in behavior / bug for some ZStream or Schedule operators.

Schedule.fixed usage may be affected by zio/zio#4144

I implemented some performance improvements, including usage of Schedule.fixed, with #250.

I'm still investigating some unexpected behavior in aggregateAsyncWithin I'm looking into.

I released a new version v0.12 with a Producer that only performs buffering when necessary, so at requests / s the records are immediately produced to Kinesis.

I did a manual 'load ramp up' test and that results in reasonably constant processing times, as you would expect.

Let me know how it works out for you and we can close this issue.

Looks much better for us. All requests under 500ms now. Think this can be closed, thanks for the quick turnaround
image

Thanks for the confirmation and happy to help!