freelawproject/reporters-db

Add valid volume number values?

jcushman opened this issue · 3 comments

For citation extraction it might be useful to track valid volume numbers for each reporter series. For example in this case CAP detects "1917 F. 680" as a cite, which could be avoided if we knew that F. only goes up to volume 300.

The simplest addition could be a "max_volume" key:

    "F.": [
        {
            ...
            "editions": {
                "F.": {
                    "end": "1924-12-31T00:00:00",
                    "start": "1880-01-01T00:00:00",
                    "max_volume": 300,
                },
                "F.2d": {
                    "end": "1993-12-31T00:00:00",
                    "start": "1924-01-01T00:00:00",
                    "max_volume": 999,
                },
                "F.3d": {
                    "end": null,
                    "start": "1993-01-01T00:00:00"
                }
            },

(No key for "F.3d" because it's currently open.)

Optional keys could be:

  • "min_volume" if the series doesn't start from 1 (not sure if this exists)
  • "skipped_volumes": [4, 8, 17] if the series skipped some volumes (not sure if this exists)
  • "extra_volumes": ["33 Suppl.", "81 1/2", "402A"] for non-numeric volume numbers (these definitely exist)

A separate question could be valid page numbers -- it's useful to know that F. is valid up to X pages, Mass. is valid up to Y pages, WL is valid up to Z digits, etc. CAP has a lot of that data for the volumes we cover. Seems tricky enough to warrant its own bug though.

What's your read on #19 @jcushman? It might take care of this?

That would also work! (With string keys, as you mentioned over there.) I somehow missed that that one included volume numbers as well as dates, but of course it does. Seems like a big step to add a line per volume (~40,000 lines) to the file, but maybe adding volume-level metadata is inevitable for one reason or another.

Yeah, I don't know if it's inevitable, but I think we'll probably get to adding per-volume dates when @flooie gets done with his current project. I think with that in mind, I'll close this one so we can focus there.

If you or somebody else wants to pick that one up in the meantime, it might be a good one to collaborate on. The current idea is that we'll use West, Lexis, and your data to get the most accurate dates.