Madhour/SeemsPhishy

OCR

Closed this issue · 3 comments

  • Run OCR on PDF
  • Layout detection (?)
  • Structure text

PyPDF2 did not yield intended results.
First working implementation w/ PDF-Miner (non-ocr approach) 06eca99.

done with pdf-miner