A PyTorch re-implementation of TextBoxes from the AAAI2017 paper by Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu. The official and original Caffe code can be found here.
- Install PyTorch by selecting your environment on the website and running the appropriate command.
- Clone this repository.
- Note: We currently only support Python 3+(Test on Python 3.6.8).
- PyTorch 0.4.1+(Test on PyTorch 1.2)
- Then download the dataset by following the instructions below.
- We now support tensorboardX for real-time loss visualization during training!
- To use tensorboardX in the browser:
# First install Python server and client pip install tensorboardX # Start the server (probably in a screen or tmux) tensorboard --logdir=run/experiments_*
- Then (during training) navigate to http://localhost:6006/ (see the Train section below for training details).
- Note: For training, we currently support ICDAR2013, ICDAR2015, ICDAR2017, COCO_TEXT and SynthText.
To make things easy, we provide bash scripts to handle the dataset downloads and setup for you. We also provide simple dataset loaders that inherit torch.utils.data.Dataset
, making them fully compatible with the torchvision.datasets
API.
Download the ICDAR datasets from the above website and extract the files with commands like unzip xxx.zip -d train_images
, and you should have the following file structure.
ICDAR
├── ICDAR2013
│ ├── Challenge2_Test_Task12_Images.zip
│ ├── Challenge2_Test_Task1_GT.zip
│ ├── Challenge2_Training_Task12_Images.zip
│ ├── Challenge2_Training_Task1_GT.zip
│ ├── test_annos
│ ├── test_images
│ ├── train_annos
│ └── train_images
├── ICDAR2015
│ ├── Challenge4_Test_Task1_GT.zip
│ ├── ch4_test_images.zip
│ ├── ch4_training_images.zip
│ ├── ch4_training_localization_transcription_gt.zip
│ ├── test_annos
│ ├── test_images
│ ├── train_annos
│ └── train_images
└── ICDAR2017
├── ch9_test_images.zip
├── ch9_training_images.zip
├── ch9_training_localization_transcription_gt.zip
├── ch9_validation_images.zip
├── ch9_validation_localization_transcription_gt.zip
├── test_images
├── train_annos
├── train_images
└── val_images
coco2014
├── annotations
│ └── COCO_Text.json
└── images
└── train2014
This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout.
The dataset consists of 800 thousand images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.
- First download the fc-reduced VGG-16 PyTorch base network weights at:
https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth - By default, we assume you have downloaded the file in the
weights
dir
mkdir weights
cd weights
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
- To train TextBoxes using the train script simply specify the parameters listed in
train.py
as a flag or manually change them.
python train.py
- Note:
- For training, an NVIDIA GPU is strongly recommended for speed.
- You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see
train.py
for options)
To evaluate a trained network:
python eval.py
You can specify the parameters listed in the eval.py
file by flagging them or manually changing them.
- TODO
- We are trying to provide PyTorch
state_dicts
(dict of weight tensors) of the latest TextBoxes model definitions trained on different datasets. - Currently, we provide the following PyTorch models:
- TextBoxes300 trained on SynthText (newest PyTorch weights)
- TODO
- TextBoxes300 trained on COCO_Text (newest PyTorch weights)
- TODO
- TextBoxes300 trained on SynthText (newest PyTorch weights)
- Our goal is to reproduce this table from the original paper
- Make sure you have jupyter notebook installed.
- Two alternatives for installing jupyter notebook:
# make sure pip is upgraded
pip3 install --upgrade pip
# install jupyter notebook
pip install jupyter
# Run this inside ssd.pytorch
jupyter notebook
- Now navigate to
demo/demo.ipynb
at http://localhost:8888 (by default) and have at it!
- Works on CPU (may have to tweak
cv2.waitkey
for optimal fps) or on an NVIDIA GPU - This demo currently requires opencv2+ w/ python bindings and an onboard webcam
- You can change the default webcam in
demo/live.py
- You can change the default webcam in
- Install the imutils package to leverage multi-threading on CPU:
pip install imutils
- Running
python -m demo.live
opens the webcam and begins detecting!
We have accumulated the following to-do list, which we hope to complete in the near future
- Still to come:
- Support for the ICDAR dataset
- Support for the COCO_TEXT dataset
- Support for text detection evluation method
- Support for input size 512 training and testing
- Support for training on custom datasets
- Support for MobileNetV2 backbone
I hope we could do this
This repo is modified from ss.pytorch.
Note: Unfortunately, this is just a hobby of ours and not a full-time job, so we'll do our best to keep things up to date, but no guarantees. That being said, thanks to everyone for your continued help and feedback as it is really appreciated. We will try to address everything as soon as possible.