laws-africa/gazettemachine-scrapers

Scrapers for Gazette Machine

Python

Gazette Machine Scrapers

These are Scrapy for Gazette Machine. They are run from Zyte and the scraped URLs are posted into S3, from where Gazette Machine pulls them in.

Development

To develop locally:

clone this repo
setup a virtualenv: python3 -m venv env
activate: source env/bin/activate
install dependencies: pip install -r requirements.txt

Deploying

To deploy:

Install the Scraping Hub commandline client with pip install shub
Run shub deploy
In Zyte configure the spider's AWS and output settings, similar to the other spiders.
In gazettemachine, update settings.GM['SCRAPINGHUB_SPIDERS'] to include the new spider, if it should be run daily.

AWS_ACCESS_KEY_ID: from AWS
AWS_SECRET_ACCESS_KEY: from AWS
FEED_FORMAT: csv
FEED_URI: s3://lawsafrica-gazettes-incoming/dropbox/<code>.csv