awslabs/eventbridge-kafka-connector

Allow to specify a rate limit (quota) for outgoing `PutEvents` calls

Closed this issue · 3 comments

What is your idea?

Add a configuration parameter (or class) to define a rate limit e.g., based on the account quota, for the outgoing PutEvents calls.

Would you be willing to make the change?

Yes

Additional context

This can reduce throttling exceptions (and potential connector downtime if retries are exhausted) when there's a large backlog of messages e.g., after a downtime of the connector, but the downstream (EventBridge API) is rate limiting the connector.

@baldawar curious about your thoughts here. IIUC (since most of it I wrote, but some stuff is SDK related), our current implementation uses 2 retries, STANDARD retry policy (which has exponential backoff), and unmodified THROTTLED_BASE_DELAY of 1 second. This should reduce the likelihood of constant throttling leading to exceeding the max retries budget, i.e., failing the connector (which we want to avoid).

We could either increase the retry count (which also affects other non-retryable exceptions since the parameter is used for both, the SDK and application-level retries). Or add a client-side rate limiter and apply a low quota which suffices for most regions, i.e., not based on us-east-1.

Do you think this is overkill?

cc/ @maschnetwork

For most use-cases, STANDARD is the way to go. It's not a bad choice. The defaults within https://github.com/aws/aws-sdk-java-v2/blob/master/core/sdk-core/src/main/java/software/amazon/awssdk/core/internal/retry/SdkDefaultRetrySetting.java#L53 look sane to me. We can offer means to override these to the clients but don't have to tweak them upfront.

One possible tuning - given that we're only sending to a single event-bus ADAPTIVE is a good idea to reduce any overload. It'll will help maximize success rate.

Thx for the feedback @baldawar

Will close this issue then for now.

One possible tuning - given that we're only sending to a single event-bus ADAPTIVE is a good idea to reduce any overload. It'll will help maximize success rate.

There's #72 in-flight where we will get batching soon :)