biglocalnews/usc-crime-reports-scraper

Figure out a PDF parsing scheme

Opened this issue · 0 comments

We tried pdfplumber, pdftotext and Tabula, none of them could parse the table. Our best text trick we could think of was to use the date formatted string that leads each report as the "break" signal in a loop through all rows. We should explore other PDF parsing tricks to see if other technique could work better.