/text-extractor

The project was made as an assignment for an internship.

Primary LanguagePython

Text Extractor

A django application to extract text from single page pdf and images.

How it works

diagram

Feature and Limitations

  • Works also with scanned pdf
  • Works with only single page pdf

Prerequisite and Installation guide

  • imageMagick (should be installed in your system)
  • Ghostscript (need to be installed in your system)
  • Tesseract-OCR (need to be installed in your system)
  • make sure environment variable of all the above is set correctly.

Steps

  • Create a virtual environment
  • install the requirements from the requirements.txt file
  • setup postgres db or comment the postgres db setting and uncomment the sqlite db settings in settings.py file
  • run the server

Screenshots

Home Page

Details Page

SignIn Page

Demo Video