Active foreign principal scraper
Got a test task during the job interview to write scraper for www.fara.gov using Scrapy.
Data format:
[
{
"registrant": "International Trade & Development Agency, Inc.",
"url": "https://efile.fara.gov/pls/apex/f?p=171:200:8832334613063::NO:RP,200:P200_REG_NUMBER,P200_DOC_TYPE,P200_COUNTRY:3690,Exhibit%20AB,TAIWAN",
"address": "Washington",
"reg_num": "3690",
"country": "TAIWAN",
"foreign_principal": "Taipei Economic & Cultural Representative Office in the U.S.",
"date": "1995-08-28 00:00:00",
"exhibit_urls": [
"http://www.fara.gov/docs/3690-Exhibit-AB-20160614-10.pdf",
...
],
"state": "DC"
},
...
]
Project is not maintained - use at your own risk. However the html markup of the site doesn't seem to be changed during the last decade - so there is a good chance that the script will work.