/tfRAM

Tensorflow implementation of Recurrent Models of Visual Attention

Primary LanguagePython

Tensorflow implementation of Recurrent Models of Visual Attention (Mnih et al. 2014), with additional research. Code based off of https://github.com/zhongwen/RAM.

Results

60 by 60 Translated MNIST

Model Error
FC, 2 layers (64 hiddens each) 6.78%
FC, 2 layers (256 hiddens each) 2.65%
Convolutional, 2 layers 1.57%
RAM, 4 glimpses, 12 x 12, 3 scale 1.54%
RAM, 6 glimpses, 12 x 12, 3 scale 1.08%
RAM, 8 glimpses, 12 x 12, 3 scale 0.94%

60 by 60 Cluttered Translated MNIST

Model Error
FC, 2 layers (64 hiddens each) 29.13%
FC, 2 layers (256 hiddens each) 11.36%
Convolutional, 2 layers 8.37%
RAM, 4 glimpses, 12 x 12, 3 scale 5.15%
RAM, 6 glimpses, 12 x 12, 3 scale 3.33%
RAM, 8 glimpses, 12 x 12, 3 scale 2.63%

100 by 100 Cluttered Translated MNIST

Model Error
Convolutional, 2 layers 16.22%
RAM, 4 glimpses, 12 x 12, 3 scale 14.86%
RAM, 6 glimpses, 12 x 12, 3 scale 8.3%
RAM, 8 glimpses, 12 x 12, 3 scale 5.9%

60 by 60 Cluttered MNIST 6 Glimpses Examples

Solid square is first glimpse, line is path of attention, circle is last glimpse.
Mean output Sampled output
mean0 samp0
mean1 samp1
mean2 samp2
mean3 samp3
mean4 samp4