A template for a web crawler using AWS Lambda

This is a template for a web crawler using AWS Lambda.

Set up the environment

You may set up the development environment via:

conda env -f envrionment.yml
npm install -g serverless

You may deploy the container image to AWS via:

cd crawler
sls deploy
cd -

Modify crawler/crawler.py to extract desired data from the URLs.

You may invoke the deployed Lambda function via:

sls invoke crawl --data '{"url": "https://example.com"}'

You may also run the following Python script to crawl a list of URLs.

python crawl.py -i urls.txt