/textboxes.pytorch

An PyTorch reimplement of text detection paper TextBoxes:A Fast Text Detector with a Single Deep Neural Network

Primary LanguagePythonMIT LicenseMIT

TextBoxes:A Fast Text Detector with a Single Deep Neural Network, in PyTorch

A PyTorch re-implementation of TextBoxes from the AAAI2017 paper by Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu. The official and original Caffe code can be found here.

Table of Contents

Installation

  • Install PyTorch by selecting your environment on the website and running the appropriate command.
  • Clone this repository.
    • Note: We currently only support Python 3+(Test on Python 3.6.8).
    • PyTorch 0.4.1+(Test on PyTorch 1.2)
  • Then download the dataset by following the instructions below.
  • We now support tensorboardX for real-time loss visualization during training!
    • To use tensorboardX in the browser:
    # First install Python server and client
    pip install tensorboardX
    # Start the server (probably in a screen or tmux)
    tensorboard --logdir=run/experiments_*
    • Then (during training) navigate to http://localhost:6006/ (see the Train section below for training details).
  • Note: For training, we currently support ICDAR2013, ICDAR2015, ICDAR2017, COCO_TEXT and SynthText.

Datasets

To make things easy, we provide bash scripts to handle the dataset downloads and setup for you. We also provide simple dataset loaders that inherit torch.utils.data.Dataset, making them fully compatible with the torchvision.datasets API.

ICDAR

Download the ICDAR datasets from the above website and extract the files with commands like unzip xxx.zip -d train_images, and you should have the following file structure.

ICDAR
├── ICDAR2013
│   ├── Challenge2_Test_Task12_Images.zip
│   ├── Challenge2_Test_Task1_GT.zip
│   ├── Challenge2_Training_Task12_Images.zip
│   ├── Challenge2_Training_Task1_GT.zip
│   ├── test_annos
│   ├── test_images
│   ├── train_annos
│   └── train_images
├── ICDAR2015
│   ├── Challenge4_Test_Task1_GT.zip
│   ├── ch4_test_images.zip
│   ├── ch4_training_images.zip
│   ├── ch4_training_localization_transcription_gt.zip
│   ├── test_annos
│   ├── test_images
│   ├── train_annos
│   └── train_images
└── ICDAR2017
    ├── ch9_test_images.zip
    ├── ch9_training_images.zip
    ├── ch9_training_localization_transcription_gt.zip
    ├── ch9_validation_images.zip
    ├── ch9_validation_localization_transcription_gt.zip
    ├── test_images
    ├── train_annos
    ├── train_images
    └── val_images

COCO_Text

coco2014
├── annotations
│   └── COCO_Text.json
└── images
    └── train2014

SynthText

This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout.

The dataset consists of 800 thousand images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.

Training TextBoxes

mkdir weights
cd weights
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
  • To train TextBoxes using the train script simply specify the parameters listed in train.py as a flag or manually change them.
python train.py
  • Note:
    • For training, an NVIDIA GPU is strongly recommended for speed.
    • You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see train.py for options)

Evaluation

To evaluate a trained network:

python eval.py

You can specify the parameters listed in the eval.py file by flagging them or manually changing them.

Performance

  • TODO

Demos

Use a pre-trained TextBoxes network for detection

Download a pre-trained network

  • We are trying to provide PyTorch state_dicts (dict of weight tensors) of the latest TextBoxes model definitions trained on different datasets.
  • Currently, we provide the following PyTorch models:
    • TextBoxes300 trained on SynthText (newest PyTorch weights)
      • TODO
    • TextBoxes300 trained on COCO_Text (newest PyTorch weights)
      • TODO
  • Our goal is to reproduce this table from the original paper

Try the demo notebook

  • Make sure you have jupyter notebook installed.
  • Two alternatives for installing jupyter notebook:
    1. If you installed PyTorch with conda (recommended), then you should already have it. (Just navigate to the ssd.pytorch cloned repo and run): jupyter notebook

    2. If using pip:

# make sure pip is upgraded
pip3 install --upgrade pip
# install jupyter notebook
pip install jupyter
# Run this inside ssd.pytorch
jupyter notebook

Try the webcam demo

  • Works on CPU (may have to tweak cv2.waitkey for optimal fps) or on an NVIDIA GPU
  • This demo currently requires opencv2+ w/ python bindings and an onboard webcam
    • You can change the default webcam in demo/live.py
  • Install the imutils package to leverage multi-threading on CPU:
    • pip install imutils
  • Running python -m demo.live opens the webcam and begins detecting!

TODO

We have accumulated the following to-do list, which we hope to complete in the near future

  • Still to come:
    • Support for the ICDAR dataset
    • Support for the COCO_TEXT dataset
    • Support for text detection evluation method
    • Support for input size 512 training and testing
    • Support for training on custom datasets
    • Support for MobileNetV2 backbone

Authors

I hope we could do this

Acknowledgements

This repo is modified from ss.pytorch.

Note: Unfortunately, this is just a hobby of ours and not a full-time job, so we'll do our best to keep things up to date, but no guarantees. That being said, thanks to everyone for your continued help and feedback as it is really appreciated. We will try to address everything as soon as possible.