Performance tests
Closed this issue · 13 comments
- Would the shopping cart sample work for performance tests?
- Should we add http routes to the sample to make it easier to test?
- K6 as load client?
A bunch of performance testing has been completed now. Everything looks ok. Latencies are predictable, though not as tight as we see with RDS Postgres and R2DBC plugin. Main limitation is probably that the client is still HTTP/1 based. Would be interesting to try things when an HTTP/2 client is supported for DynamoDB, or maybe trying with a pipelining client.
Will add some results and screenshots to this issue.
Under provisioned test, where the throughput is higher than the provisioned capacity on the table.
At first DynamoDB will allow the higher throughput, using burst capacity, before throttling the writes.
The throttled write errors will retry in the DynamoDB client. In this test, enough progress is made that the journal circuit breaker is not tripped, but requests timing out on ask timeout of 5 seconds.
Test with projections. Providing there are enough resources on the application side, then things can keep up fine. If the deployment is under provisioned, then it can persist events faster than projections can keep up — with projections doing more database work given the additional backtracking queries — consumer lag (wait time in the Cinnamon metrics) will increase, and eventually projections will start failing once it gets too far behind the backtracking window.
In this test run, enough resources to maintain projections. Publishing events disabled at this throughput. Projection envelope source distinguished in the metrics (query, pubsub, or backtracking). The latency spikes part way through should be partition splitting in DynamoDB.
For a hotspot test, a single entity instance hotspot will likely not be throttled, given lower WCU usage. Given an average write latency of 5ms, throughput of a single entity instance can only be 200/s. With larger payloads (consuming more WCUs), or multiple entity hotspots that are mapped to the same partition by DynamoDB, could then also see the partition throughput limit throttling. Throttling errors are retryable, so will retry in the client. Shouldn't necessarily cause client errors, but increased latencies and further limiting the throughput.
eventually projections will start failing once it gets too far behind the backtracking window
Will it eventually catch up if we stop writing at high throughput?
Is this the akka.persistence.dynamodb.query.backtracking.window
?
Or is it akka.projection.dynamodb.offset-store.time-window
?
Maybe we can revisit the windows for dynamodb since there are some differences compared to r2dbc, such as lazy loading of offsets and each BySliceQuery is for a single slice.
Yes, that should've been that it's failing on backtracking once it's outside the offset store time window (5 minutes). Envelopes rejected from backtracking because of unexpected sequence numbers. Doesn't recover. Can be recovered by restarting with a big enough backtracking window. We saw similar with r2dbc as well.
Issues and fixes for projections falling behind are separated out. Projections are the weakest point in terms of performance, requiring plenty of resources. Otherwise I think performance testing is covered. Closing this issue.