airbnb/binaryalert

Provide low-throughput alternative for dispatcher cron

Closed this issue · 3 comments

Background

By default, the dispatcher is invoked every minute. While this interval is configurable, the dispatcher cron just doesn't make much sense for deployments with low throughput. You either pay for lots of wasted dispatcher invocations or you wait a long time before a binary is processed.

Options

You could let the S3 event notification invoke the analyzer directly (eliding the queue entirely). The problem is that the queue is pretty much required for retroactive analysis.

Instead, one suggestion was to change the CloudWatch event to be triggered from an SQS metric alarm instead of a cronjob. For example, when SQS:ApproximateNumberOfMessagesVisible > 0, you could invoke the dispatcher. This might introduce a several minute delay, but would be a reasonable compromise between cost and time-to-analysis.

Discussion welcome!

Thanks for opening this issue!

In the S3 event driven solution, for retroactive analysis could the 'batcher' invoke the analysis lambda directly (skipping the queue)?

You are limited by 1,000 concurrently executing Lambda functions in a given region. If you have more than 1,000 files in S3, the batcher will likely be invoking analyzer Lambdas faster than they can be processed, and AWS will throttle you.

This is somewhat mitigated by the fact that each analyzer could still process multiple files, but I don't know if it would be enough.

If you have < 1,000 files, or if you want to add a delay to the batch processing, you could do direct invocation

This problem would be alleviated if you could invoke Lambda from SQS, but it's not clear when or if AWS will support this