/EATEN

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

EATEN: Entity-aware Attention for Single Shot Visual Text Extraction

Accepted to ICDAR 2019 arxiv
Authors: He Guo, Xiameng Qin, Jiaming Liu, Junyu Han, Jingtuo Liu and Errui Ding

Abstract

This repository is designed to provide an open-source dataset for Visual Text Extraction.

Samples

Train ticket

Real images

real1 real2

Synthetic images

Some clean images

synth-easy

Some hard images

synth-hard

Passport

Some images

passport-easy

Some hard images

passport-hard

Business card

bc1 bc2

Downloads

The dataset can be downloaded through the following link:
baiduyun, PASSWORD: e4z1

Some details:

scenes number size Google Drive link
train ticket 300k synth + 1.9 real 13G dataset_trainticket.tar
passport 100k synth 5.8G dataset_passport.tar
business card 200k synth 19G dataset_business.tar.0 dataset_business.tar.1 dataset_business.tar.2 dataset_business.tar.3

Limitations&&Todo

  • [A large of training data]
    Todo:
    1. Use CycleGan or domain adaptation to synth data to train EATEN.
    2. Introduce datasets of STR to EATEN.
  • [Generalization on complex scenes]
    Todo:
    1. Add bounding box annotations of ToIs to EATEN, such as 2019-ICCV-oral Towards Unconstrained End-to-End Text Spotting.
  • [Engineering]
    1. Merge server decoder to one.
    2. parallel decoding.