/AutoScan

Business card Details Extraction Pipeline

Primary LanguageJupyter Notebook

Auto Scan Awesome

Text Extraction & NER Recognition Tool using pytesseract! Powered by Python!

Live Demo :

Table of Contents

.
├── data
|   ├── Images                          # Sampel Images for Testing
|   ├── TrainData.pickle                # Processed Train Data Dump
|   ├── train.spacy                     # Processed Train Data (Spacy Format)
├── Prediction_testing_Single_Image.ipynb # Final Prediction File
├── output                                # Test files (alternatively `spec` or `tests`)
│   ├── model-best                      # Train Spacy NER Model Files
│
└── ...

This is an automatic Business Cards' Text Extraction & Labeling. BIO Tagging is used to prepare the training data and to train the Spacy NER Model.

  • It automatically detects the texts in the Input Business Cards using pytesseract.
  • Detecting the Entities using Spacy Displacy - with custom styling.
Input Image BIO Tagging
  • Drawing Bounding Box using the left,top,width,height positions.

  • Creating a Grouping Label utility to group the NAME, EMAIL, DES, ORG,PHONE,WEB.

BIO Tagging Grouped Labeling

Built With

This section should list any major frameworks that we have used to build the application.


Getting Started 🤖

In this section, A whole installation guide is mentioned. also trouble shooting guide is also given.

This is an example of how to list things you need to use the software and how to install them.

  • Python ( Version 3.8.5)
  • Git
  1. Clone the repo
    git clone https://github.com/Sghosh1999/AutoScan.git

**Open Command prompt and navigate to the folder(AutoGenerateTrackers) cd AutoScan

  1. Create a Virtual Python environment
    python -m pip install virtualenv
    python -m virtualenv myendsv
  2. Activate the Virtual environment
    myenv\scripts\activate
  3. Install necessary packages
    python -m pip install -r requirements.txt
    python -m pip install streamlit
  4. Check if these two packages are installed or not: (optional)
    python -m streamlit --version
    If streamlit is not recognized, then run the command (optional)
  python -m pip install streamlit

Open the Prediction_testing_Single_Image file and run the code with the desired Business Card File.