Attention-based Extraction of Structured Information from Street View Imagery

Question

Attention-based Extraction of Structured Information from Street View Imagery

wanghaisheng opened this issue 7 years ago · 0 comments

paper:https://arxiv.org/pdf/1704.03549.pdf
code:https://github.com/tensorflow/models/tree/master/research/attention_ocr

Abstract—We present a neural network model — based onConvolutional Neural Networks, Recurrent Neural Networksand a novel attention mechanism — which achieves 84.2%accuracy on the challenging French Street Name Signs (FSNS)dataset, significantly outperforming the previous state of theart (Smith’16), which achieved 72.46%. Furthermore, our newmethod is much simpler and more general than the previousapproach. To demonstrate the generality of our model, weshow that it also performs well on an even more challengingdataset derived from Google Street View, in which the goal isto extract business names from store fronts. Finally, we studythe speed/accuracy tradeoff that results from using CNN featureextractors of different depths. Surprisingly, we find that deeperis not always better (in terms of accuracy, as well as speed).Our resulting model is simple, accurate and fast, allowing itto be used at scale on a variety of challenging real-world textextraction problems