ocr_pdfs

This repo has the following scripts for extracting text from pdfs.

tika_pdfs.py    - for text that has already been ocr'ed
ocr_pdfs.py     - for text that has not already been ocr'red

There are comments in the scripts including where the path to pdfs and results are to be changed. Both the scripts at the end gives out results in .txt format.

elliottash/ocr_pdfs

ocr_pdfs