Paper Collection - A List of Computer Vision Papers and Notes

Image Classification:

Network in Network [Paper] [Note] [Torch Code]

  • Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).

VGG [Paper] [Note] [Torch Code]

  • Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

GoogleNet [Paper] [Note] [Torch Code]

  • Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

ResNet [Paper] [Note] [Torch Code]

  • He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Popular Module

Dropout [Paper] [Note]

  • Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958.

Batch Normalization [Paper] [Note]

  • Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.

Object Detection in Image

RCNN [Paper] [Note] [Code]

  • Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation

Spatial pyramid pooling in deep convolutional networks for visual recognition [[Paper]] (http://arxiv.org/abs/1406.4729) [Note] [Code]

  • He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015, 37(9): 1904-1916.

Fast R-CNN [[Paper]] (http://arxiv.org/pdf/1504.08083) [Note] [Code]

  • Ross Girshick, Fast R-CNN, arXiv:1504.08083.

Faster R-CNN, Microsoft Research [[Paper]] (http://arxiv.org/pdf/1506.01497) [Note] [Code] [Python Code]

  • Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.

End-to-end people detection in crowded scenes [[Paper]] (http://arxiv.org/abs/1506.04878) [Note] [Code]

  • Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in crowded scenes, arXiv:1506.04878.

You Only Look Once: Unified, Real-Time Object Detection [[Paper]] (http://arxiv.org/abs/1506.02640) [Note] [Code]

  • Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640

Adaptive Object Detection Using Adjacency and Zoom Prediction [[Paper]] (http://arxiv.org/abs/1512.07711) [Note]

  • Lu Y, Javidi T, Lazebnik S. Adaptive Object Detection Using Adjacency and Zoom Prediction[J]. arXiv:1512.07711, 2015.

Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [Paper] [Note]

  • Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick. arXiv:1512.04143, 2015.

G-CNN: an Iterative Grid Based Object Detector [Paper]

  • Mahyar Najibi, Mohammad Rastegari, Larry S. Davis. arXiv:1512.07729, 2015.

Seq-NMS for Video Object Detection [Paper] [Note]

  • Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang. Seq-NMS for Video Object Detection. arXiv preprint arXiv:1602.08465, 2016

Image Caption

Exploring Nearest Neighbor Approaches for Image Captioning [Paper]

  • Devlin J, Gupta S, Girshick R, et al. Exploring Nearest Neighbor Approaches for Image Captioning[J]. arXiv preprint arXiv:1505.04467, 2015.

Show and Tell: A Neural Image Caption Generator [Paper] [Note]

  • Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

Image Generations:

Pixel Recurrent Neural Networks [Paper] [Note]

  • van den Oord A, Kalchbrenner N, Kavukcuoglu K. Pixel Recurrent Neural Networks[J]. arXiv preprint arXiv:1601.06759, 2016.

Variational Autoencoder [Paper] [Note]

  • Kingma D P, Welling M. Auto-encoding variational bayes[J]. arXiv preprint arXiv:1312.6114, 2013.

DRAW: A recurrent neural network for image generation [Paper] [Torch Code] [Tensorflow Code] [Note]

  • Gregor K, Danihelka I, Graves A, et al. DRAW: A recurrent neural network for image generation[J]. arXiv preprint arXiv:1502.04623, 2015.

Scribbler: Controlling Deep Image Synthesis with Sketch and Color [Paper] [Note]

  • Patsorn Sangkloy, Jingwan Lu, et al. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. arXiv preprint arXiv:1612.00835, 2016.

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [Paper]

  • Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv preprint arXiv:1511.06434, 2015.

Improved Techniques for Training GANs [Paper]

  • Salimans T, Goodfellow I, Zaremba W, et al. Improved Techniques for Training GANs[J]. arXiv preprint arXiv:1606.03498, 2016.

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[Paper]

  • Chen X, Duan Y, Houthooft R, et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[J]. arXiv preprint arXiv:1606.03657, 2016.

Image-to-Image Translation with Conditional Adversarial Networks [Paper] [Note] [Torch Code] [Tensorflow Code]

  • Isola P, Zhu J Y, Zhou T, et al. Image-to-Image Translation with Conditional Adversarial Networks[J]. arXiv preprint arXiv:1611.07004, 2016.

Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts [Paper] [Note]

  • Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem. Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts [J]. arXiv preprint arXiv:1612.00215, 2016.

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks [Paper] [Note]

  • Kim, Taeksoo, et al. "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks." arXiv preprint arXiv:1703.05192 (2017).

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks [Paper] [Note]

  • Zhu J Y, Park T, Isola P, et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks[J]. arXiv preprint arXiv:1703.10593, 2017.

BEGAN: Boundary Equilibrium Generative Adversarial Networks [Paper] [Note]

  • Berthelot, David, Tom Schumm, and Luke Metz. "BEGAN: Boundary Equilibrium Generative Adversarial Networks." arXiv preprint arXiv:1703.10717 (2017).

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [Paper] [Note] [Tensorflow Code]

  • Zhang, Han, et al. "StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks." arXiv preprint arXiv:1612.03242 (2016).

Invertible Conditional GANs for image editing [Paper] [Note]

  • Perarnau G, van de Weijer J, Raducanu B, et al. Invertible Conditional GANs for image editing[J]. arXiv preprint arXiv:1611.06355, 2016.

Stacked Generative Adversarial Networks [Paper] [Note]

  • Huang X, Li Y, Poursaeed O, et al. Stacked generative adversarial networks[J]. arXiv preprint arXiv:1612.04357, 2016.

Rotating Your Face Using Multi-task Deep Neural Network [Paper] [Note]

  • Yim J, Jung H, Yoo B I, et al. Rotating your face using multi-task deep neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 676-684.

Image and Language

Learning Deep Representations of Fine-Grained Visual Descriptions [Paper] [Note]

  • Reed, Scott, et al. "Learning deep representations of fine-grained visual descriptions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Activation Maximization

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks [Paper] [Note]

  • Nguyen A, Dosovitskiy A, Yosinski J, et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks[J]. arXiv preprint arXiv:1605.09304, 2016.

Style Transfer

A neural algorithm of artistic style [Paper] [Note]

  • Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint arXiv:1508.06576, 2015.

Perceptual losses for real-time style transfer and super-resolution [Paper] [Note]

  • Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution[J]. arXiv preprint arXiv:1603.08155, 2016.

Preserving Color in Neural Artistic Style Transfer [Paper] [Note] [Pytorch Code]

  • Gatys, Leon A., et al. "Preserving color in neural artistic style transfer." arXiv preprint arXiv:1606.05897 (2016).

A Learned Representation For Artistic Style [Paper] [Note] [Tensorflow Code] [Lasagne Code]

  • Dumoulin, Vincent, Jonathon Shlens, and Manjunath Kudlur. "A learned representation for artistic style." (2017).

Demystifying Neural Style Transfer [Paper]

  • Li, Yanghao, et al. "Demystifying Neural Style Transfer." arXiv preprint arXiv:1701.01036 (2017).

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [Paper]

  • Huang, Xun, and Serge Belongie. "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization." arXiv preprint arXiv:1703.06868 (2017).

Fast Patch-based Style Transfer of Arbitrary Style [Paper]

  • Chen, Tian Qi, and Mark Schmidt. "Fast Patch-based Style Transfer of Arbitrary Style." arXiv preprint arXiv:1612.04337 (2016).

Low-level vision

Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution [Paper] [Note]

  • Il Jun Ahn, Woo Hyun Nam. Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution [J]. arXiv preprint arXiv:1612.00085, 2016.

Deep Joint Image Filtering [Paper] [Note]

  • Li Y, Huang J B, Ahuja N, et al. Deep joint image filtering[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 154-169.

Image Segmentation

Fully convolutional networks for semantic segmentation [Paper] [Note]

  • Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431-3440.

Video Editing

Deep Video Color Propagation [Paper] [Note]

  • Meyer S, Cornillère V, Djelouah A, et al. Deep Video Color Propagation. BMVC 2018.

Deep Matching

AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching [Paper] [Note]

  • Novotný D, Larlus D, Vedaldi A. AnchorNet: A Weakly Supervised Network to Learn Geometry-Sensitive Features for Semantic Matching, CVPR. 2017

Open Courses

  • CS231n: Convolutional Neural Networks for Visual Recognition [Course Page]
  • CS224d: Deep Learning for Natural Language Processing [Course Page]

Online Books

  • Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Mathmatics

  • Introduction to Probability Models, Sheldon M. Ross

Misc

k-means++: The advantages of careful seeding [Paper] [Note]

  • Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]//Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007: 1027-1035.