San Francisco court scraper
Closed this issue · 3 comments
PDAP is going to take a crack at this one. If anyone else is interested in helping, feel free to comment.
Everything in a quote block is from @zstumgoren, and is therefore more authoritative than anything I have to say.
Goal
Our scraping framework generally requires:
- A way to search case filings by date, in order to backfill cases systematically by compiling an index of cases going back a few years in time.
- A way to search for case filings by case number. This allows us to update case detail information on an ongoing basis.
Properties to collect
- case number
- date generated
- case title
In terms of case details, so far our framework has generally focused on scraping metadata about a case directly available from the web. We are not currently downloading case files, though we wouldn’t be opposed to having that functionality available if you chose to implement it. But our v1 base goal is the ability to compile a docket of all cases and related metadata.
From some basic searching, it seems like the SF court search is somewhat spotty on case details, at least for a handful of recent cases I clicked through. But even if we can get nothing more than the date of filing and a case number, that’s enough for us to check in on a case for details on a regular basis (perhaps those details get posted some window of time after initial filing date).
Hopefully that provides enough detail for you all to get started. Shout back if you have any other questions.
Scraping target
For this court: https://www.sfsuperiorcourt.org/
Case query takes us here: https://webapps.sftc.org/ci/CaseInfo.dll
Search by new filings → date gives us a scrapable paginated table (fig. 1)
Clicking a case number gives us a page with a date (fig. 2)
fig. 1
fig. 2
Other helpful info
Follow this guide: https://court-scraper.readthedocs.io/en/latest/writing_a_scraper.html
Note that there’s no need for CLI integration (we’re planning to deprecate that in the future).
Anything I can do to help someone get started?
Hello hello,
Happy to get involved where I can.
I have data analytics skills in addition to research and structure around data governance and visualization. Very basic programming skills but happy to learn.