Implementation for PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai.
Here we transformed every frame in a video, then combined the results. Click to go to the full demo on YouTube!
git clone git@github.com:luoweimeng/pixel_link.git
- Python 2.7
- Tensorflow-gpu >= 1.1
- opencv2
- setproctitle
- matplotlib
Anaconda is recommended to for an easier installation:
- Install Anaconda
- Create and activate the required virtual environment by:
conda env create --file pixel_link_env.txt
source activate pixel_link
import matplotlib.pyplot as plt
import pixellink
# Load the image
pl = pixellink.pixelLinkDetector("/Users/luoweimeng/Code/data/test/40.jpg")
# output bounding boxes, (x1, y1, x2, y2, x3, y3, x4, y4) top_left、top_right、bottom_left,bottom_right
pl.detect()
# output pixel score
pixel_score = pl.draw_pixel_score()
plt.imshow(pixel_score, cmap='gray')
# output link score 0:top_left 1:top 2:top_right 3:left 4:right 5:bottom_left 6:bottom 7:bottem_right
link_score = pl.draw_link_score(6)
plt.imshow(link_score, cmap='gray')
# drow the bounding boxes on the original image
plt.imshow(pl.draw_bbox())
Scripts for converting ICDAR2017 / ICDAR2015 and SynthText datasets have been provided in the datasets
directory.
It not hard to write a converting script for your own dataset.
python train_pixel_link.py \
--train_dir=/workdir/chengyuming/pixel_link/train/ic17_whole \
--num_gpus=1 \
--learning_rate=1e-3\
--gpu_memory_fraction=-1 \
--train_image_width=512 \
--train_image_height=512 \
--batch_size=16 \
--dataset_dir=/workdir/chengyuming/dataset/pixel_link/ICDAR \
--dataset_name=icdar2017 \
--dataset_split_name=train \
--max_number_of_steps=200000\
--checkpoint_path=/workdir/chengyuming/pixel_link/train/ic17/model.ckpt-151160 \
--using_moving_average=1