Quadrant Perception Network (QPNet)
A simple and elegant model based on instance segmentation is designed for text detection in a document and natural image. Consisting of convolutional and bidirectional long short term memory (BiLSTM) networks, it focuses on segmenting the close text instances and detecting the long text to improve the practicability in real applications. The input images are encoded by their grid locations related to the four quadrants of an object and the background. BiLSTMs with transposing operations are used to combine the left-right and up-down contexts. Without bounding box regression, only one output classification branch is designed to predict the accurate location of each pixel, namely quadrant perception. Therefore, it is easy to train. Finally, simple post-processing is employed to find text locations naturally.
Install
-
Clone the project
git clone https://github.com/kakusyun/qpnet cd qpnet
-
Create a conda virtual environment and activate it
conda create -n qpnet python=3.7 -y conda activate qpnet
-
Install dependencies
# If you dont have pytorch conda install pytorch torchvision cudatoolkit=10.1 -c pytorch pip install -r requirements.txt
-
Build
cd qpnet python setup.py develop
-
Data preparation
Please use labelme to label your samples and get the dataset like:
# datasets/dataset_name |--images |--jsons
If you don't use labelme, just let your data like above.
cd datasets/preprocessing python one_step_preprocessing.py
Run
-
Train:
python train.py
To switch single GPU training or multiply GPUs training, please change tools/train_qp.py.
-
Test:
python test.py
To switch single GPU training or multiply GPUs training, please change tools/test_qp.py.
-
Infer:
python infer.py
To switch single GPU training or multiply GPUs training, please change tools/infer_qp.py.