- This is Python script that converts any PDF to text using tesseract-OCR. I made this to process pdfs in which text is not selectable.
- Please donot use on normal pdfs of which you can just copy out text as this is a heavy to process and slowtask
- it works best on simple pdfs which have data in simple book format(also depends on your tesseract installation), more updates coming soon maybe
- This uses Tesseract-OCR binaries, pytesseract, PyMuPDF and PIL packages
- If you cannot install fitz. try "pip install PyMuPDF"
Kaushal1011/PyPDFtoText
This is a Python script that converts any PDF to text using Tesseract-OCR(For Text locked pdfs).
Python