• reading-list

Basic Network

  • AlexNet MLA Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. ⭐️⭐️⭐️⭐️⭐️ extensive experiment thinking

  • G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and

    R. R. Salakhutdinov. Improving neural networks by preventing

    co-adaptation of feature detectors. arXiv preprint

    arXiv:1207.0580, 2012.

  • Dropout Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever. Ruslan Salakhutdinov Dropout: A Simple Way to Prevent Neural Networks from Overfitting. 2014. ⭐️⭐️⭐️⭐️ ensemble architecture adding noise

  • GoogLeNet Christian Szegedy et al. "Going deeper with convolutions" [InceptionV1] CVPR 2015.  ⭐️⭐️⭐️⭐️⭐️ efficient multi-scale 1x1 conv

  • VGG Karen Simonyan & Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recongnition".

  • PReLU & msra Initilization: He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015. ⭐️⭐️⭐️⭐️⭐️

  • Batch Normalization & Inception V2 Sergey Ioffe et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML2015.⭐️⭐️⭐️⭐️⭐️ stable fast convergency

  • InceptinV3 Christian Szegedy et al. Rethinking the Inception Architecture for Computer Vision. CVPR 2016.

    ⭐️⭐️⭐️⭐️⭐️design principles label smoothing reduce

  • Warmup & LR Priya Goyal, Piotr Dollar, Ross Grishick et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. 2018.

  • Identity ResNet He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐️⭐️⭐️⭐️ pre-activation

  • ResNet: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐️⭐️⭐️⭐️⭐️ ##CVPR 2016 Best Paper

  • ResNeXt Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." arXiv preprint arXiv:1611.05431 (2016). ⭐️⭐️⭐️⭐️ cardinality

  • InceptionV4 & Inception-ResNet: Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." arXiv preprint arXiv:1602.07261 (2016). ⭐️⭐️⭐️⭐️ Inception& residual

  • PolyNet: Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." arXiv preprint arXiv:1611.05725 (2016). Slides ⭐️⭐️⭐️⭐️ divsersity

  • Xception: Chollet, François. "Xception: Deep Learning with Depthwise Separable Convolutions." arXiv preprint arXiv:1610.02357 (2016). ⭐️⭐️⭐️ channel correlation decoupled

  • SENet: Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks."In CVPR (2018). ⭐️⭐️⭐️⭐️⭐️ decouple channel excited

  • DenseNet: Huang, Gao, et al. "Densely connected convolutional networks." arXiv preprint arXiv:1608.06993 (2016). ⭐️⭐️⭐️⭐️⭐️ features reuse ##CVPR 2017 Best Paper

  • Rethinking ImageNet Pre-training: He, Kaiming, Ross Girshick, and Piotr Dollár. "Rethinking ImageNet Pre-training." arXiv preprint arXiv:1811.08883 (2018). ⭐️⭐️⭐️

  • Non-local Neural Network: Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. "Non-local Neural Networks." arXiv preprint arXiv:1711.07971 (2017). ⭐️⭐️⭐️⭐️

##3D vision

###Point cloud

  • PointNet Charles R Qi, Hao Su et al. "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation". arXiv: 1612.00593v2 (CVPR 2017). ⭐️⭐️⭐️
  • PointNet++: C. R. Qi et al. "Deep Hierarchical Feature Learning on Point Sets in a Metric Space". (NeurIPS 2017) [pdf] [Github] ⭐️ ⭐️ ⭐️ ⭐️
  • PointCNN: Y. Li et al. "Convolution On X-Transformed Points" (NeurIPS 2018). [pdf] [Github] ⭐️ ⭐️ ⭐️
  • RS-CNN: Y. Liu et al. "Relation-Shape Convolutional Neural Network for Point Cloud Analysis" ⭐️⭐️⭐️⭐️

Object detection

  • SPP: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014. ⭐️⭐️⭐️⭐️⭐️
  • Fast RCNN: Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐️⭐️⭐️⭐️⭐️
  • Faster RCNN: Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. ⭐️⭐️⭐️⭐️⭐️
  • YOLO: You Only Look Once:Unified, Real-Time Object Detection. ⭐️⭐️⭐️⭐️⭐️

##Vision & Language

Visual Grounding / Referring Expressions (Images):

  • Karpathy, Andrej, Armand Joulin, and Li F. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. Advances in neural information processing systems. 2014. [Paper]
  • Karpathy, Andrej, and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. Method name: Neural Talk. [Paper] [Code] [Torch Code] [Website]
  • Hu, Ronghang, et al. Natural language object retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Method name: Spatial Context Recurrent ConvNet (SCRC) [Paper] [Code] [Website]
  • Mao, Junhua, et al. Generation and comprehension of unambiguous object descriptions. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [Paper] [Code]
  • Wang, Liwei, Yin Li, and Svetlana Lazebnik. Learning deep structure-preserving image-text embeddings. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [Paper] [Code]
  • Yu, Licheng, et al. Modeling context in referring expressions. European Conference on Computer Vision. Springer, Cham, 2016. [Paper][Code]
  • Nagaraja, Varun K., Vlad I. Morariu, and Larry S. Davis. Modeling context between objects for referring expression understanding. European Conference on Computer Vision. Springer, Cham, 2016.[Paper] [Code]
  • Rohrbach, Anna, et al. Grounding of textual phrases in images by reconstruction. European Conference on Computer Vision. Springer, Cham, 2016. Method Name: GroundR [Paper] [Tensorflow Code] [Torch Code]
  • Wang, Mingzhe, et al. Structured matching for phrase localization. European Conference on Computer Vision. Springer, Cham, 2016. Method name: Structured Matching [Paper] [Code]
  • Hu, Ronghang, Marcus Rohrbach, and Trevor Darrell. Segmentation from natural language expressions. European Conference on Computer Vision. Springer, Cham, 2016. [Paper] [Code] [Website]
  • Fukui, Akira et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. EMNLP (2016). Method name: MCB [Paper][Code]
  • Endo, Ko, et al. An attention-based regression model for grounding textual phrases in images. Proc. IJCAI. 2017. [Paper]
  • Chen, Kan, et al. MSRC: Multimodal spatial regression with semantic context for phrase grounding. International Journal of Multimedia Information Retrieval 7.1 (2018): 17-28. [Paper -Springer Link]
  • Wu, Fan et al. An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning. CoRR abs/1703.07579 (2017): n. pag. [Paper] [Code]
  • Yu, Licheng, et al. A joint speakerlistener-reinforcer model for referring expressions. Computer Vision and Pattern Recognition (CVPR). Vol. 2. 2017. [Paper] [Code][Website]
  • Hu, Ronghang, et al. Modeling relationships in referential expressions with compositional modular networks. Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017. [Paper] [Code]
  • Luo, Ruotian, and Gregory Shakhnarovich. Comprehension-guided referring expressions. Computer Vision and Pattern Recognition (CVPR). Vol. 2. 2017. [Paper] [Code]
  • Liu, Jingyu, Liang Wang, and Ming-Hsuan Yang. Referring expression generation and comprehension via attributes. Proceedings of CVPR. 2017. [Paper]
  • Xiao, Fanyi, Leonid Sigal, and Yong Jae Lee. Weakly-supervised visual grounding of phrases with linguistic structures. arXiv preprint arXiv:1705.01371 (2017). [Paper]
  • Plummer, Bryan A., et al. Phrase localization and visual relationship detection with comprehensive image-language cues. Proc. ICCV. 2017. [Paper] [Code]
  • Yu, Licheng, et al. Mattnet: Modular attention network for referring expression comprehension. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. [Paper] [Code] [Website]
  • Chen, Yen-Chun, et al. UNITER: Learning UNiversal Image-TExt Representations. arXiv preprint arXiv:1909.11740 (2019). [Paper]

Person ReID

  • AlignedReID Xuan Zhang, Hao Luo et al. AlignedReID: Surpassing Human-Level Performance in Person Re-Identification. 2015. ⭐️⭐️⭐️⭐️ stripe-based local distance

  • Mahdi M. Kalayeh et al. Human Semantic Parsing for Person Re-identification. CVPR ( 2018 ). ⭐️ segamentic transfer

  • Huo et al. Interaction-and-Aggregation Network for Person Re-identification. CVPR ( 2019 ). ⭐️⭐️⭐️⭐️ adaptively localize parts by modeling spatial feature self-attention

  • Luo et al. Bag of Tricks and A Strong Baseline for Deep Person Re-identification. CVPR ( 2019 ). ⭐️⭐️⭐️⭐️ tricks bnneck inter-intra-class

  • Zheng et al. Joint Discriminative and Generative Learning for Person Re-identification. CVPR( 2019 ).

  • Wang et al. Spatial-Temporal Person Re-identification. CVPR( 2019 ).

NIPS

  • Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu. Spatial Transformer Networks. NIPS (2015) ⭐️⭐️⭐️⭐️ affine transform learnable layer