This repo contains the record scrapers (and associated tooling) to further the goals of the Police Data Accessibility Project. Thank you for your interest in contributing!
- Clone this repo.
- Make a copy of the template folder in the appropriate jurisdiction folder. Read more about structure below.
- Code your scraper.
- Scrape sample data from the source and add a truncated version to the folder so we understand the kind of data your scraper generates.
- Complete the readme to the best of your ability.
- If you know how to use Splunk, complete the config file.
- Submit a Pull Request for approval.
Stick to the format of USA/$STATE/$COUNTY/$RECORD_TYPE
. If there are state-level records being scraped, use USA/$STATE/_State/$RECORD_TYPE
. Use underscores rather than spaces or dashes.
Only scrapers that comply with our legal guidelines will be merged into this repo.
Python is preferred. If you use another language, please document your work.
Your scraper must comply with our legal guidelines.
Everyone working on this project is using their free time. Please expect some back-and-forth communication when speaking to the individuals reviewing your PR's and be patient and respectful with us. The more work you do to test and validate that your scraper has met the contribution guidelines, the quicker we can accept it.
The #scrapers_general slack channel is the place to start.
This dataset catalogue is how we track potential sources.
Note: the naming convention for these fields may not be consistent across data sources. If any fields are not retrievable please fill it with "NA".
- _id
- _state
- _county
- CaseNum
- FirstName
- MiddleName
- LastName
- Suffix
- DOB
- Race
- Sex
- ArrestDate
- FilingDate
- OffenseDate
- DivisionName
- CaseStatus
- DefenseAttorney
- PublicDefender
- Judge
- ChargeCount
- ChargeStatute
- ChargeDescription
- ChargeDisposition
- ChargeDispositionDate
- ChargeOffenseDate
- ChargeCitationNum
- ChargePlea
- ChargePleaDate
- ArrestingOfficer
- ArrestingOfficerBadgeNumber