/pdf-text-data-extractor

PDF text data extraction web app with OCR for scanned documents

Primary LanguagePython

PDF to Text

Open in Streamlit visitor badge forks badge starts badge

PDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments.

pdf_text_image

How does it worK?

flowchart LR

A[PDF] --> |text conversion / OCR| B(Text)
B --> |Option 1| D[txt file]
B --> |Option 2| E[ZIP folder of txt files for pages]

Loading
  1. Upload your PDF.
  2. Enable OCR (for scanned documents).
  3. Select the PDF language.
  4. Download your output file (zip/txt).

How to support the project

You can help support the project through feedback and/or buy me coffee.