Error parsing offenses for some dockets
Closed this issue · 1 comments
adamrlinder commented
The most common error I got when parsing the 13,000 dockets was the one below. Best I can tell, this happens because these dockets do not have sequential charges. They may only have charges 3 and 4. In some cases, I saw it where charges 1, 2, 3, 5, 8, 9 were on the docket, but the others had been dropped along the way. This is an issue we only encounter parsing dockets for older cases, since charges may be dropped or otherwise updated along the way.
Here are some samples. I can provide more.
MC-51-CR-0002740-2020.pdf
MC-51-CR-0000353-2020.pdf
File "/Users/Shared/CFP Scraping/pbf-scraping/analyses/full_dockets/one_time_parse.py", line 13, in <module>
parse = parse_pdf(path_folder+file, text)
File "/Users/Shared/CFP Scraping/pbf-scraping/analyses/full_dockets/parse_docket.py", line 113, in parse_pdf
result['offenses'] = get_charges(pdf, pages_charges)
File "/Users/Shared/CFP Scraping/pbf-scraping/analyses/full_dockets/funcs_parse.py", line 141, in get_charges
charges = offense(pdf,p,y2_1,y1_0,x1_0,x3_0,charges)
File "/Users/Shared/CFP Scraping/pbf-scraping/analyses/full_dockets/funcs_parse.py", line 49, in offense
y_array_bottom[k-1-h] = y
IndexError: index 2 is out of bounds for axis 0 with size 2```
bertamb commented
Solved