/ocr-cpi-covid19

Primary LanguageJavaScriptMIT LicenseMIT

ocr-cpi-covid19

For more details, follow the series in my blog

(Coletando os dados da CPI - Parte I

Requirements

  • imagemagick
  • tesseract
  • brew (For MacOS)
  • Node v14
  • npm or yarn

Install

npm install

Process files

PDF Images

Some PDFs are images, so they need conversion before pass on OCR, to facilitate that it I created the convert-pdf-images,

STEP 1 Run node convert-pdf-images, so you can generate PNGs from it, after pass filelocation with filename

STEP 2 Run node ocr.js after change the json for filelist