This is a template for a web crawler using AWS Lambda.
You may set up the development environment via:
conda env -f envrionment.yml
npm install -g serverless
You may deploy the container image to AWS via:
cd crawler
sls deploy
cd -
Modify
crawler/crawler.py
to extract desired data from the URLs.
You may invoke the deployed Lambda function via:
sls invoke crawl --data '{"url": "https://example.com"}'
You may also run the following Python script to crawl a list of URLs.
python crawl.py -i urls.txt
This work is inspired by the code provided in https://github.com/umihico/docker-selenium-lambda.