- Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation
✅ Spatial pyramid pooling in deep convolutional networks for visual recognition [[Paper]] (http://arxiv.org/abs/1406.4729) [Note] [Code]
- He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015, 37(9): 1904-1916.
✅ Fast R-CNN [[Paper]] (http://arxiv.org/pdf/1504.08083) [Note] [Code]
- Ross Girshick, Fast R-CNN, arXiv:1504.08083.
✅ Faster R-CNN, Microsoft Research [[Paper]] (http://arxiv.org/pdf/1506.01497) [Note] [Code] [Python Code]
- Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.
✅ End-to-end people detection in crowded scenes [[Paper]] (http://arxiv.org/abs/1506.04878) [Note] [Code]
- Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in crowded scenes, arXiv:1506.04878.
✅ You Only Look Once: Unified, Real-Time Object Detection [[Paper]] (http://arxiv.org/abs/1506.02640) [Note] [Code]
- Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640
✅ Adaptive Object Detection Using Adjacency and Zoom Prediction [[Paper]] (http://arxiv.org/abs/1512.07711) [Note]
- Lu Y, Javidi T, Lazebnik S. Adaptive Object Detection Using Adjacency and Zoom Prediction[J]. arXiv:1512.07711, 2015.
✅ Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [Paper] [Note]
- Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick. arXiv:1512.04143, 2015.
✅ G-CNN: an Iterative Grid Based Object Detector [Paper]
-
Mahyar Najibi, Mohammad Rastegari, Larry S. Davis. arXiv:1512.07729, 2015.
-
SSD [Paper]
- Liu W, Anguelov D, Erhan D, et al. SSD: Single Shot MultiBox Detector[J]. arXiv preprint arXiv:1512.02325, 2015.
-
Deep Residual Learning for Image Recognition [Paper]
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2015
-
Diagnosing error in object detectors [Paper]
- Hoiem D, Chodpathumwan Y, Dai Q. Diagnosing error in object detectors[M]//Computer Vision–ECCV 2012. Springer Berlin Heidelberg, 2012: 340-353.
✅ Seq-NMS for Video Object Detection [Paper] [Note]
- Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang. Seq-NMS for Video Object Detection. arXiv preprint arXiv:1602.08465, 2016
✅ Exploring Nearest Neighbor Approaches for Image Captioning [Paper]
- Devlin J, Gupta S, Girshick R, et al. Exploring Nearest Neighbor Approaches for Image Captioning[J]. arXiv preprint arXiv:1505.04467, 2015.
✅ Variational Autoencoder [Paper] [Note]
- Kingma D P, Welling M. Auto-encoding variational bayes[J]. arXiv preprint arXiv:1312.6114, 2013.
✅ DRAW: A recurrent neural network for image generation [Paper] [Torch Code] [Tensorflow Code] [Note]
-
Gregor K, Danihelka I, Graves A, et al. DRAW: A recurrent neural network for image generation[J]. arXiv preprint arXiv:1502.04623, 2015.
-
Improving Variational Inference with Inverse Autoregressive Flow [Paper]
- Kingma D P, Salimans T, Welling M. Improving Variational Inference with Inverse Autoregressive Flow[J]. arXiv preprint arXiv:1606.04934, 2016.
-
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [Paper]
- Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv preprint arXiv:1511.06434, 2015.
-
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models [Paper]
- Eslami S M, Heess N, Weber T, et al. Attend, Infer, Repeat: Fast Scene Understanding with Generative Models[J]. arXiv preprint arXiv:1603.08575, 2016.
-
Improved Techniques for Training GANs [Paper]
- Salimans T, Goodfellow I, Zaremba W, et al. Improved Techniques for Training GANs[J]. arXiv preprint arXiv:1606.03498, 2016.
-
Variational Inference with Normalizing Flows [Paper]
- Rezende D J, Mohamed S. Variational inference with normalizing flows[J]. arXiv preprint arXiv:1505.05770, 2015.
-
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[Paper]
- Chen X, Duan Y, Houthooft R, et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[J]. arXiv preprint arXiv:1606.03657, 2016.
-
Deep Convolutional Inverse Graphics Network [Paper]
- Kulkarni T D, Whitney W F, Kohli P, et al. Deep convolutional inverse graphics network[C]//Advances in Neural Information Processing Systems. 2015: 2539-2547.
- Efficient Back Prop [Paper]
- LeCun Y A, Bottou L, Orr G B, et al. Efficient backprop[M]//Neural networks: Tricks of the trade. Springer Berlin Heidelberg, 2012: 9-48.
- Batch Normalization [Paper]
- Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.
- A guide to convolution arithmetic for deep learning [Paper]
- Dumoulin V, Visin F. A guide to convolution arithmetic for deep learning[J]. arXiv preprint arXiv:1603.07285, 2016.
- Decoupled Neural Interfaces using Synthetic Gradients [Paper]
- Max Jaderberg, Wojciech Marian, Czarnecki Simon Osindero, et al. Decoupled Neural Interfaces using Synthetic Gradients. arXiv preprint arXiv:1608.05343, 2016.
- A neural algorithm of artistic style [Paper] [Note]
- Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint arXiv:1508.06576, 2015.
- Perceptual losses for real-time style transfer and super-resolution [Paper] [Note]
- Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution[J]. arXiv preprint arXiv:1603.08155, 2016.
- Practical recommendations for gradient-based training of deep architectures [Paper]
- Bengio Y. Practical recommendations for gradient-based training of deep architectures[M]//Neural Networks: Tricks of the Trade. Springer Berlin Heidelberg, 2012: 437-478.
✅ Fully convolutional networks for semantic segmentation [Paper] [Note]
- Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431-3440.
- CS231n: Convolutional Neural Networks for Visual Recognition [Course Page]
- CS224d: Deep Learning for Natural Language Processing [Course Page]
- Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville
- Introduction to Probability Models, Sheldon M. Ross