audienceproject/spark-dynamodb

Understanding throughput and rate limiting

zya opened this issue · 1 comments

zya commented

When writing and using the throughput parameter, how does the throughput value get implemented?
For example, when setting it to 100, is the limit 100 writes per second or maximum of 100 concurrent writes? Also as putBatchItem is used by default, is it 100 putBatchItem calls or 100 records?

Hello :)
When you set throughput to 100, it means your cluster will consumer a combined total of 100 writes per second. The concurrency is calculated based on the number of executor cores available in the cluster.
Under normal circumstances (if your items are <4KB), this will result in 100 records being inserted per second, which is 4 putBatchItem invocations with the default item limit of 25.