Crawling and preserving climate data
- Check each possible doc type and ensure its saved properly
- Find/Create database to store everything, automate saving crawls directly to database (Archivers app?)
- Hash documents for data integrity
- Check each document type is saved correctly
- Use box api to autoupload documents --Zach
- For each docket, start a crawl for Primary, Supporting, and Comments
- Get all links to document pages
- Automate finding how many documents in each docket
- Get links for each document and attachment on doc pages
- Download and save each document
- Find and download all pictures embedded on the page
- Get document metadata (id, tracking number, date posted, RIN, etc)
- Download information in the "show more details" section
- Log all errors to file so they can be checked later