ankitdevelops/text-extractor

The project was made as an assignment for an internship.

Python

Text Extractor

A django application to extract text from single page pdf and images.

How it works

Feature and Limitations

Works also with scanned pdf
Works with only single page pdf

Prerequisite and Installation guide

imageMagick (should be installed in your system)
Ghostscript (need to be installed in your system)
Tesseract-OCR (need to be installed in your system)
make sure environment variable of all the above is set correctly.

Steps

Create a virtual environment
install the requirements from the requirements.txt file
setup postgres db or comment the postgres db setting and uncomment the sqlite db settings in settings.py file
run the server

Screenshots

Home Page

Details Page

SignIn Page