/DAO-OCR

A Flask web app that integrates Tesseract OCR to extract text from image files.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

DAO OPTICAL CHARACTER RECOGNITION TOOL

Language Framework HTML CSS Javascript hosted hosted Docker Gitpod reposize

About

A Flask web app that integrates Tesseract OCR to extract text from image files..

Flask Web App

Hashnode Article

Medium Article

Presentation Slide

YouTube Demo

Contributors

Table of Contents

Repository File Structure

β”œβ”€β”€β”€static
β”‚   β”œβ”€β”€β”€css
β”‚   β”‚   └───images
β”‚   β”œβ”€β”€β”€font-awesome
β”‚   β”‚   └───fonts
β”‚   β”œβ”€β”€β”€fonts
β”‚   β”‚   β”œβ”€β”€β”€font-awesome-4.7.0
β”‚   β”‚   β”‚   β”œβ”€β”€β”€css
β”‚   β”‚   β”‚   β”œβ”€β”€β”€fonts
β”‚   β”‚   β”‚   β”œβ”€β”€β”€less
β”‚   β”‚   β”‚   └───scss
β”‚   β”‚   β”œβ”€β”€β”€Linearicons-Free-v1.0.0
β”‚   β”‚   β”‚   └───WebFont
β”‚   β”‚   β”œβ”€β”€β”€montserrat
β”‚   β”‚   └───poppins
β”‚   β”œβ”€β”€β”€forms
β”‚   β”œβ”€β”€β”€img
β”‚   β”œβ”€β”€β”€js
β”‚   β”œβ”€β”€β”€test-img
β”‚   β”œβ”€β”€β”€vendor
β”‚   β”‚   β”œβ”€β”€β”€aos
β”‚   β”‚   β”œβ”€β”€β”€bootstrap
β”‚   β”‚   β”‚   β”œβ”€β”€β”€css
β”‚   β”‚   β”‚   └───js
β”‚   β”‚   β”œβ”€β”€β”€bootstrap-icons
β”‚   β”‚   β”‚   └───fonts
β”‚   β”‚   β”œβ”€β”€β”€boxicons
β”‚   β”‚   β”‚   β”œβ”€β”€β”€css
β”‚   β”‚   β”‚   └───fonts
β”‚   β”‚   β”œβ”€β”€β”€glightbox
β”‚   β”‚   β”‚   β”œβ”€β”€β”€css
β”‚   β”‚   β”‚   └───js
β”‚   β”‚   β”œβ”€β”€β”€isotope-layout
β”‚   β”‚   β”œβ”€β”€β”€php-email-form
β”‚   β”‚   β”œβ”€β”€β”€purecounter
β”‚   β”‚   └───swiper
β”‚   └───vendorlog
β”‚       β”œβ”€β”€β”€animate
β”‚       β”œβ”€β”€β”€animsition
β”‚       β”‚   β”œβ”€β”€β”€css
β”‚       β”‚   └───js
β”‚       β”œβ”€β”€β”€bootstrap
β”‚       β”‚   β”œβ”€β”€β”€css
β”‚       β”‚   └───js
β”‚       β”œβ”€β”€β”€countdowntime
β”‚       β”œβ”€β”€β”€css-hamburgers
β”‚       β”œβ”€β”€β”€daterangepicker
β”‚       β”œβ”€β”€β”€jquery
β”‚       β”œβ”€β”€β”€perfect-scrollbar
β”‚       └───select2
β”œβ”€β”€β”€templates
β”œβ”€β”€β”€venv
β”‚   β”œβ”€β”€β”€Include
β”‚   β”œβ”€β”€β”€Lib
β”‚   β”‚   └───site-packages
β”‚   β”‚       β”œβ”€β”€β”€ #Files and Folders
β”‚   └───Scripts
└───__pycache__ 

Untitled Diagram drawio

Problem Statement

In today's world, we are flooded with a vast amount of data in different forms such as images, PDFs, and scanned documents. Extracting text data from these sources can be a tedious and time-consuming task. It is essential to convert these data sources into a digital format that can be easily processed and analyzed. Manually transcribing text from images is not only time-consuming but also prone to errors, making the process unreliable. This is where Optical Character Recognition (OCR) comes into play. The goal of this project is to create a Flask web app that can integrate Tesseract OCR to extract text from image files accurately.

Proposed Approach

The proposed approach is to create a Flask web app that can accept image files, extract text using Tesseract OCR, and display the extracted text in a readable format. The user will be able to upload an image file to the web app, and the web app will process the image using Tesseract OCR. The extracted text will then be displayed on the web app, and the user will have the option to copy the extracted text file. To ensure accuracy, we will test the web app with various image file formats and compare the extracted text with the original text.

Tools Used πŸ”§

  • Python
  • Flask
  • Flask-Scss
  • HTML
  • CSS
  • Javascript
  • Tesseract-OCR
  • Pytesseract
  • Rails
  • Docker

Project Proposal Demo

Group.DAO.-.Project.Proposal.mp4

Web App Demo

DAO.Demo.mp4

OCR In Action

ezgif com-gif-maker

How to run the Application

Running on Local Machine

To run the application on your local system do the following:

  1. Clone the repository:
git clone https://github.com/JayRalph360/DAO-OCR.git
  1. Change the directory:
cd DAO-OCR
  1. Install the requirements:
pip install -r requirements.txt
  1. Run the application
python -m flask run

You should be able to view the application by going to http://127.0.0.1:5000/

Running on Local Machine with Docker Compose

You can also run the application in a docker container using docker compose(if you have it installed)

  1. Clone the repository:
git clone https://github.com/JayRalph360/DAO-OCR.git
  1. Change the directory:
cd DAO-OCR
  1. Run the docker compose command
docker compose up -d --build 

You should be able to view the application by going to http://localhost:5000/

Running in a Gitpod Cloud Environment

Click the button below to start a new development environment:

Open in Gitpod

Tests

Test Flask Web App Functions

To test the Flask Web app do the following:

  1. Clone the repository:
git clone https://github.com/JayRalph360/DAO-OCR.git
  1. Change the working directory and install the requirements and pytest:
cd src && pip install -r requirements.txt && pip install pytest
  1. Move to the tests folder and run the tests
cd .. && cd tests && pytest

Deployment

Deploying the Application to Heroku

Assuming you have git and heroku cli installed just carry out the following steps:

  1. Clone the repository:
git clone https://github.com/JayRalph360/DAO-OCR.git
  1. Change the directory:
cd DAO
  1. Login to Heroku
heroku login
heroku container:login
  1. Create your application
heroku create your-app-name

Replace your-app-name with the name of your choosing.

  1. Build the image and push to Container Registry:
heroku container:push web
  1. Then release the image to your app:
heroku container:release web

Click the button below to deploy the application.

Deploy

Deploy the Application to Railway Click the button below to deploy the Application to railway

Deploy on Railway

License

GNU General Public License v3.0

TODO

  • Research
  • Development
  • Deployment
  • Testing
  • Artcle writing
  • Presentation