Attention-based Extraction of Structured Information from Street View Imagery
wanghaisheng opened this issue · 0 comments
paper:https://arxiv.org/pdf/1704.03549.pdf
code:https://github.com/tensorflow/models/tree/master/research/attention_ocr
Abstract—We present a neural network model — based onConvolutional Neural Networks, Recurrent Neural Networksand a novel attention mechanism — which achieves 84.2%accuracy on the challenging French Street Name Signs (FSNS)dataset, significantly outperforming the previous state of theart (Smith’16), which achieved 72.46%. Furthermore, our newmethod is much simpler and more general than the previousapproach. To demonstrate the generality of our model, weshow that it also performs well on an even more challengingdataset derived from Google Street View, in which the goal isto extract business names from store fronts. Finally, we studythe speed/accuracy tradeoff that results from using CNN featureextractors of different depths. Surprisingly, we find that deeperis not always better (in terms of accuracy, as well as speed).Our resulting model is simple, accurate and fast, allowing itto be used at scale on a variety of challenging real-world textextraction problems