This is a Scrapy project collecting together the three scrapers I put together for various projects at RSParly 2011.
- bills
- committees
- papers
In order to run the spiders you need to have Scrapy installed. Check out the project and run scrapy's 'crawl' command from the root directory.
> scrapy crawl bills
There is no explicit output defined so if you'd like the result written to a file but don't want to write your own then you can use scrapy's built in feed exports.
The following will produce a file called bills.json
with one json object per line, suitable for importing into
MongoDB.
> scrapy crawl bills --set FEED_URI=bills.json --set FEED_FORMAT=jsonlines