Auto Scan

Text Extraction & NER Recognition Tool using pytesseract! Powered by Python!

Live Demo :

File Structures

.
├── data
|   ├── Images                          # Sampel Images for Testing
|   ├── TrainData.pickle                # Processed Train Data Dump
|   ├── train.spacy                     # Processed Train Data (Spacy Format)
├── Prediction_testing_Single_Image.ipynb # Final Prediction File
├── output                                # Test files (alternatively `spec` or `tests`)
│   ├── model-best                      # Train Spacy NER Model Files
│
└── ...

Description
Usecase 1: Business Card Text Recognition
Prerequisites
Installation Guide

Description

This is an automatic Business Cards' Text Extraction & Labeling. BIO Tagging is used to prepare the training data and to train the Spacy NER Model.

Business cards Text Extraction & Labeling

It automatically detects the texts in the Input Business Cards using pytesseract.
Detecting the Entities using Spacy Displacy - with custom styling.

Input Image	BIO Tagging

Drawing Bounding Box using the left,top,width,height positions.
Creating a Grouping Label utility to group the NAME, EMAIL, DES, ORG,PHONE,WEB.

BIO Tagging	Grouped Labeling

Built With

This section should list any major frameworks that we have used to build the application.

Getting Started 🤖

In this section, A whole installation guide is mentioned. also trouble shooting guide is also given.

Prerequisites

This is an example of how to list things you need to use the software and how to install them.

Python ( Version 3.8.5)
Git

Installation

Clone the repo

git clone https://github.com/Sghosh1999/AutoScan.git

**Open Command prompt and navigate to the folder(AutoGenerateTrackers) cd AutoScan

Create a Virtual Python environment

python -m pip install virtualenv
python -m virtualenv myendsv

Activate the Virtual environment
```
myenv\scripts\activate
```

Install necessary packages

python -m pip install -r requirements.txt
python -m pip install streamlit

Check if these two packages are installed or not: (optional)
```
python -m streamlit --version
```
If streamlit is not recognized, then run the command (optional)

  python -m pip install streamlit

Open the Prediction_testing_Single_Image file and run the code with the desired Business Card File.

Sghosh1999/AutoScan