ocr_pdfs
This repo has the following scripts for extracting text from pdfs.
tika_pdfs.py - for text that has already been ocr'ed
ocr_pdfs.py - for text that has not already been ocr'red
There are comments in the scripts including where the path to pdfs and results are to be changed. Both the scripts at the end gives out results in .txt format.