freelawproject/juriscraper

`sc` scraped docket numbers are incorrect

grossir opened this issue · 2 comments

For example in this case, we collect "28236" as docket number, since it's a number available on the HTML results page, labelled on the CSS class as "case number"
image

However, on the downloaded files themselves, that value is labeled as the "Opinion Number". The docket number is also on the extracted text, as "Appellate Case No. 2021-001296". We could correct this with second pass of extract_from_text over the already extracted content
image