castorm/kafka-connect-http

Repeated API calls upon data stream update exceeding API quota

Closed this issue · 4 comments

Describe the bug
Am now getting locked out of the NY Times API with a 429 response code and "Rate limit quota violation. Quota limit exceeded" error.

To Reproduce
Use the following source connector config
nytimes_connector_quota_problem_config.txt

Expected behavior
To not exceed the API's rate limit.

Kafka Connect:

  • Version 5.5.0

Plugin:

  • Version 0.7.5

Additional context
The throttle limit for the NY Times is up to 10 times per minute or 4000 times per day. I created a new app to get a new API key and waited a full day to ensure I hadn't hit the daily limit for some reason. I also turned up the throttling to poll only once every 5 min. I think that when NYTimes updates it's data stream, for some reason the connector is hitting the API endpoint more than 10 times in a minute.

Hi @rumline, thanks for reporting this.

I understand this still happened after you increased intervals to 5 minutes?

Thanks,
Best regards.

Correct. It's set to 300000 millis.

You were hitting this issue because when the connector detects "it has fallen behind" in data consumption, it switches to a "catchup" throttler which is mean to query more often.

It turns out there was a typo in the configuration property for such scenario:
http.throttle.catchup.interval.millis instead of http.throttler.catchup.interval.millis, as documented.

I've fixed the name of the property, and I'm about to release v0.7.6 with the fix, so you can either upgrade or use the old configuration property name in the meantime.

I've also increased default interval values to 60s and 30s respectively, to be nicer by default towards restricted APIs such as yours.

Thanks again, I hope this helps.
Best regards.

Thank you. You might want to double check the main README.md file. Some of the new config options you added in 0.7.4 didn't make it into the 0.7.5 version of the instructions.