dismantl/CaseHarvester

Create spider Lambda function

Closed this issue · 2 comments

Currently the spider component is only run from the command line. We need a Lambda function for the spider that can be triggered:

  1. By a CloudWatch rule on a schedule, for example each day or week, i.e. cron.
  2. Manually by sending a SNS message.

Each method should include parameters that set the search criteria (e.g. a specific county and time range). For example, the weekly run could search for cases over the last month, while the daily run could only search for cases within the last week.

Another thing to note: the spider component currently prompts the user if there is an existing queue that will be overwritten. This should be avoided for Lambda runs.

Completed as of commit 3af2d73. Uses AWS ECS to run containerized spider tasks at scheduled intervals.