/pdf2text

An application that can extract editable as well as scanned text from PDF files.

Primary LanguagePython

The Ultimate PDF to Text Converter

An application that can extract editable as well as scanned text from PDF files.

Execution

Execute main.py using Python 3.8+.

python main.py

Prerequisites and Dependencies

  • PyPI requirements can be installed via requirements.txt.
    pip3 install -r requirements.txt
    
    sudo might be required for allowing the modification of the PATH environment variable.
  • Tesseract must be installed and added to PATH for text recognition. It can be downloaded and installed here.
  • Poppler must be installed and added to PATH for text recognition to work. Below are instructions for various operating systems.
    • Windows users can download and install it here.
    • MacOS users can install it using Homebrew.
      brew install poppler
      
    • Most Linux distributions already ship with Poppler. If absent, it can manually be installed.
      sudo apt install poppler-utils
      

Behind the idea

Made with ❤ by Manvendra, Harsh, and Param.