Text Extraction & NER Recognition Tool using pytesseract! Powered by Python!
.
├── data
| ├── Images # Sampel Images for Testing
| ├── TrainData.pickle # Processed Train Data Dump
| ├── train.spacy # Processed Train Data (Spacy Format)
├── Prediction_testing_Single_Image.ipynb # Final Prediction File
├── output # Test files (alternatively `spec` or `tests`)
│ ├── model-best # Train Spacy NER Model Files
│
└── ...
This is an automatic Business Cards' Text Extraction & Labeling. BIO Tagging is used to prepare the training data and to train the Spacy NER Model.
- It automatically detects the texts in the Input Business Cards using pytesseract.
- Detecting the Entities using Spacy Displacy - with custom styling.
Input Image | BIO Tagging |
---|---|
-
Drawing Bounding Box using the left,top,width,height positions.
-
Creating a Grouping Label utility to group the NAME, EMAIL, DES, ORG,PHONE,WEB.
BIO Tagging | Grouped Labeling |
---|---|
This section should list any major frameworks that we have used to build the application.
In this section, A whole installation guide is mentioned. also trouble shooting guide is also given.
This is an example of how to list things you need to use the software and how to install them.
- Python ( Version 3.8.5)
- Git
- Clone the repo
git clone https://github.com/Sghosh1999/AutoScan.git
**Open Command prompt and navigate to the folder(AutoGenerateTrackers)
cd AutoScan
- Create a Virtual Python environment
python -m pip install virtualenv python -m virtualenv myendsv
- Activate the Virtual environment
myenv\scripts\activate
- Install necessary packages
python -m pip install -r requirements.txt python -m pip install streamlit
- Check if these two packages are installed or not: (optional)
If streamlit is not recognized, then run the command (optional)
python -m streamlit --version
python -m pip install streamlit
Open the Prediction_testing_Single_Image file and run the code with the desired Business Card File.