NerdWalletOSS/kinesis-python

Producer loop ignores max # of messages per put_records

Opened this issue · 1 comments

Looking at the producer's loop, it looks like there is no limit to the number of messages (only size is referenced) per flush.
In case of multiple small messages one might surpass the 500 msgs count and the put_messages would fail.

Have I missed something?

Thanks,
itamar

Hi @itamarla 👋

Thanks for the issue.

I think you're right in that the current implementation is naive and doesn't really match the AWS limits:

Each shard can support up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). This write limit applies to operations such as PutRecord and PutRecords.
-- http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html

Our producer is simply enforcing a limit of 1Mb per buffer time cycle (1 sec by default), which if you have more than 1 shard isn't actually correct.

To further complicate things, if the buffer time is changed then we need to calculate against it. And finally, we might not be the only producer, in which case we're likely to be throttled if we do try to use the full limit.

For now I'm going to add a simple check that ensures we don't add more than 1000 messages to a put operation and will spend some time thinking about a more robust solution for the long term.