nashtash/python_covid19_nrw

Python

Automated workflow for data scraping

Process

Get data from URL
Clean data in Pandas dataframe
Convert dataframe to .csv
Store .csv in S3 bucket as text file
Deploy scraper to AWS lambda and run it there every 3 minutes

Creating a new scraper

Create a copy of the scraper_template.py file or one of the existing scrapers
Implement your scraper
Add your scraper to the list of scrapers in handler.py

How to test locally

Run your scraper .py skript: python get_data_scrapername.py

Deploy

Push your code to the repo, it will get deployed automatically

Contributing

Make pull request (or ask for repo membership)
After review your new scraper will be deployed automatically

Error reporting

Any errors in a scraper are reported to Sentry
Use asserts to confirm the data is valid
If non-critical but interesting changes to the data are noticed, report them to Slack via webhook