biglocalnews/court-scraper

San Francisco court scraper

Closed this issue · 3 comments

PDAP is going to take a crack at this one. If anyone else is interested in helping, feel free to comment.

Everything in a quote block is from @zstumgoren, and is therefore more authoritative than anything I have to say.

Goal

Our scraping framework generally requires:

  • A way to search case filings by date, in order to backfill cases systematically by compiling an index of cases going back a few years in time.
  • A way to search for case filings by case number. This allows us to update case detail information on an ongoing basis.

Properties to collect

  • case number
  • date generated
  • case title

In terms of case details, so far our framework has generally focused on scraping metadata about a case directly available from the web. We are not currently downloading case files, though we wouldn’t be opposed to having that functionality available if you chose to implement it. But our v1 base goal is the ability to compile a docket of all cases and related metadata.

From some basic searching, it seems like the SF court search is somewhat spotty on case details, at least for a handful of recent cases I clicked through. But even if we can get nothing more than the date of filing and a case number, that’s enough for us to check in on a case for details on a regular basis (perhaps those details get posted some window of time after initial filing date).

Hopefully that provides enough detail for you all to get started. Shout back if you have any other questions.

Scraping target

For this court: https://www.sfsuperiorcourt.org/
Case query takes us here: https://webapps.sftc.org/ci/CaseInfo.dll
Search by new filings → date gives us a scrapable paginated table (fig. 1)
Clicking a case number gives us a page with a date (fig. 2)

fig. 1

Screen Shot 2022-12-15 at 2 02 40 PM

fig. 2

Screen Shot 2022-12-15 at 2 01 48 PM

Other helpful info

Follow this guide: https://court-scraper.readthedocs.io/en/latest/writing_a_scraper.html

Note that there’s no need for CLI integration (we’re planning to deprecate that in the future).

Anything I can do to help someone get started?

abnor commented

Hello hello,

Happy to get involved where I can.

I have data analytics skills in addition to research and structure around data governance and visualization. Very basic programming skills but happy to learn.

Hey, @abnor. What sort of help might you need to get started? I've taken a look at the target site and have some notes, but I'm not sure what else might be helpful to you.