kvn219/cluttered-mnist

Use spatial transformer network to automatically locate a small region in an image

LevinJ opened this issue · 2 comments

Hi Kevin, I was delighed to find your article while googling for spatial transformer network :)

Currently I am thinking of using Spatial Transformer to automatically locate the serial number region in a banknote. An example banknote and serial number region is attached below.

The ultimate goal is to train serial number recognition network. The input are raw banknote images, and the labels are serial number digit. The general idea is to use spatial transformer module to automatically locate the snr region, and then feed the cropped/warpped snr region to further cnn layers for recognition. Do you think spatial transformer network is capable of automatically locating such a small region in the image by sheer backpropogation?

f001z11479

Hi LevinJ!

I don't see why not. STN should be able to attend to arbitrary regions and locate the serial number.

Also, based on the sequential nature of the serial codes, you might be interested in the RNN-SPN. Check out this paper for the details. Their implementation zooms in on each individual digit. The awesome authors provide source code here.

Hi Kevin, Thank you for the awesome references. I truly appreciate it!

I will be applying the above model architecture to my problem in the following few days, and see how it would perform. :)