/Textract

Text Extractor for scanned images and documents. Scans and extracts the content of the file saving loads of time and reduces the chance of typographical error to 0%.

Primary LanguageHTML

Textract

  • The main motive behind this project was that we often faced the problem of separately typing any content instead of copy-pasting from an already existing document or image which are not in typed format.

  • Hence, a text extractor which would simply scanning and extracting the content of the file would save loads of time and also reduce the chances of typographical error to 0%.

Application Link:

Flow of the Application

  • Our system takes the scanned image/document from the user as an input.

  • Then perform some image pre-processing techniques, like scaling, binarization and noise removal.

  • Use Optical Character Recognition using Tesseract Engine and extract the text.

Usage Guidelines:

1. Desktop Version

  • The link http://18.222.220.89:5000/ lands on this page, where you can submit the file from which you want to extract text.

  • After uploading and submitting a file, the result appear as shown in the image and you click on Copy To Clipboard to copy and use the text as you want.

2. Mobile Version

  • After downloading the APK package from this link, install it in your device and start the app.

  • Upload an image and the results come out as follows. Then simply copy-paste the text and use it as per you requirement.

Technology Used:

  • Flask
  • Tesseract OCR Engine
  • TensorFlow, OpenCV
  • Flutter
  • AWS (Deployment)