reading-list: A repository from hbb1

reading-list

Basic Network

AlexNet MLA Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. ⭐️⭐️⭐️⭐️⭐️ extensive experiment thinking
G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and

R. R. Salakhutdinov. Improving neural networks by preventing

co-adaptation of feature detectors. arXiv preprint

arXiv:1207.0580, 2012.
Dropout Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever. Ruslan Salakhutdinov Dropout: A Simple Way to Prevent Neural Networks from Overfitting. 2014. ⭐️⭐️⭐️⭐️ ensemble architecture adding noise
GoogLeNet Christian Szegedy et al. "Going deeper with convolutions" [InceptionV1] CVPR 2015. ⭐️⭐️⭐️⭐️⭐️ efficient multi-scale 1x1 conv
VGG Karen Simonyan & Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recongnition".
PReLU & msra Initilization: He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015. ⭐️⭐️⭐️⭐️⭐️
Batch Normalization & Inception V2 Sergey Ioffe et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML2015.⭐️⭐️⭐️⭐️⭐️ stable fast convergency

InceptinV3 Christian Szegedy et al. Rethinking the Inception Architecture for Computer Vision. CVPR 2016.

⭐️⭐️⭐️⭐️⭐️design principles label smoothing reduce
Warmup & LR Priya Goyal, Piotr Dollar, Ross Grishick et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. 2018.
Identity ResNet He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐️⭐️⭐️⭐️ pre-activation
ResNet: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐️⭐️⭐️⭐️⭐️ ##CVPR 2016 Best Paper
ResNeXt Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." arXiv preprint arXiv:1611.05431 (2016). ⭐️⭐️⭐️⭐️ cardinality
InceptionV4 & Inception-ResNet: Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." arXiv preprint arXiv:1602.07261 (2016). ⭐️⭐️⭐️⭐️ Inception& residual
PolyNet: Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." arXiv preprint arXiv:1611.05725 (2016). Slides ⭐️⭐️⭐️⭐️ divsersity
Xception: Chollet, François. "Xception: Deep Learning with Depthwise Separable Convolutions." arXiv preprint arXiv:1610.02357 (2016). ⭐️⭐️⭐️ channel correlation decoupled
SENet: Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks."In CVPR (2018). ⭐️⭐️⭐️⭐️⭐️ decouple channel excited
DenseNet: Huang, Gao, et al. "Densely connected convolutional networks." arXiv preprint arXiv:1608.06993 (2016). ⭐️⭐️⭐️⭐️⭐️ features reuse ##CVPR 2017 Best Paper
Rethinking ImageNet Pre-training： He, Kaiming, Ross Girshick, and Piotr Dollár. "Rethinking ImageNet Pre-training." arXiv preprint arXiv:1811.08883 (2018). ⭐️⭐️⭐️
Non-local Neural Network: Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. "Non-local Neural Networks." arXiv preprint arXiv:1711.07971 (2017). ⭐️⭐️⭐️⭐️

##3D vision

###Point cloud

PointNet Charles R Qi, Hao Su et al. "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation". arXiv: 1612.00593v2 (CVPR 2017). ⭐️⭐️⭐️
PointNet++: C. R. Qi et al. "Deep Hierarchical Feature Learning on Point Sets in a Metric Space". (NeurIPS 2017) [pdf] [Github] ⭐️ ⭐️ ⭐️ ⭐️
PointCNN: Y. Li et al. "Convolution On X-Transformed Points" (NeurIPS 2018). [pdf] [Github] ⭐️ ⭐️ ⭐️
RS-CNN： Y. Liu et al. "Relation-Shape Convolutional Neural Network for Point Cloud Analysis" ⭐️⭐️⭐️⭐️

Object detection

SPP: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014. ⭐️⭐️⭐️⭐️⭐️
Fast RCNN: Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐️⭐️⭐️⭐️⭐️
Faster RCNN: Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. ⭐️⭐️⭐️⭐️⭐️
YOLO: You Only Look Once:Unified, Real-Time Object Detection. ⭐️⭐️⭐️⭐️⭐️

##Vision & Language

Visual Grounding / Referring Expressions (Images):

Karpathy, Andrej, Armand Joulin, and Li F. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. Advances in neural information processing systems. 2014. [Paper]
Karpathy, Andrej, and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. Method name: Neural Talk. [Paper] [Code] [Torch Code] [Website]
Hu, Ronghang, et al. Natural language object retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Method name: Spatial Context Recurrent ConvNet (SCRC) [Paper] [Code] [Website]
Mao, Junhua, et al. Generation and comprehension of unambiguous object descriptions. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [Paper] [Code]
Wang, Liwei, Yin Li, and Svetlana Lazebnik. Learning deep structure-preserving image-text embeddings. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [Paper] [Code]
Yu, Licheng, et al. Modeling context in referring expressions. European Conference on Computer Vision. Springer, Cham, 2016. [Paper][Code]
Nagaraja, Varun K., Vlad I. Morariu, and Larry S. Davis. Modeling context between objects for referring expression understanding. European Conference on Computer Vision. Springer, Cham, 2016.[Paper] [Code]
Rohrbach, Anna, et al. Grounding of textual phrases in images by reconstruction. European Conference on Computer Vision. Springer, Cham, 2016. Method Name: GroundR [Paper] [Tensorflow Code] [Torch Code]
Wang, Mingzhe, et al. Structured matching for phrase localization. European Conference on Computer Vision. Springer, Cham, 2016. Method name: Structured Matching [Paper] [Code]
Hu, Ronghang, Marcus Rohrbach, and Trevor Darrell. Segmentation from natural language expressions. European Conference on Computer Vision. Springer, Cham, 2016. [Paper] [Code] [Website]
Fukui, Akira et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. EMNLP (2016). Method name: MCB [Paper][Code]
Endo, Ko, et al. An attention-based regression model for grounding textual phrases in images. Proc. IJCAI. 2017. [Paper]
Chen, Kan, et al. MSRC: Multimodal spatial regression with semantic context for phrase grounding. International Journal of Multimedia Information Retrieval 7.1 (2018): 17-28. [Paper -Springer Link]
Wu, Fan et al. An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning. CoRR abs/1703.07579 (2017): n. pag. [Paper] [Code]
Yu, Licheng, et al. A joint speakerlistener-reinforcer model for referring expressions. Computer Vision and Pattern Recognition (CVPR). Vol. 2. 2017. [Paper] [Code][Website]
Hu, Ronghang, et al. Modeling relationships in referential expressions with compositional modular networks. Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017. [Paper] [Code]
Luo, Ruotian, and Gregory Shakhnarovich. Comprehension-guided referring expressions. Computer Vision and Pattern Recognition (CVPR). Vol. 2. 2017. [Paper] [Code]
Liu, Jingyu, Liang Wang, and Ming-Hsuan Yang. Referring expression generation and comprehension via attributes. Proceedings of CVPR. 2017. [Paper]
Xiao, Fanyi, Leonid Sigal, and Yong Jae Lee. Weakly-supervised visual grounding of phrases with linguistic structures. arXiv preprint arXiv:1705.01371 (2017). [Paper]
Plummer, Bryan A., et al. Phrase localization and visual relationship detection with comprehensive image-language cues. Proc. ICCV. 2017. [Paper] [Code]
Yu, Licheng, et al. Mattnet: Modular attention network for referring expression comprehension. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. [Paper] [Code] [Website]
Chen, Yen-Chun, et al. UNITER: Learning UNiversal Image-TExt Representations. arXiv preprint arXiv:1909.11740 (2019). [Paper]

Person ReID

AlignedReID Xuan Zhang, Hao Luo et al. AlignedReID: Surpassing Human-Level Performance in Person Re-Identification. 2015. ⭐️⭐️⭐️⭐️ stripe-based local distance
Mahdi M. Kalayeh et al. Human Semantic Parsing for Person Re-identification. CVPR ( 2018 ). ⭐️ segamentic transfer
Huo et al. Interaction-and-Aggregation Network for Person Re-identification. CVPR ( 2019 ). ⭐️⭐️⭐️⭐️ adaptively localize parts by modeling spatial feature self-attention
Luo et al. Bag of Tricks and A Strong Baseline for Deep Person Re-identification. CVPR ( 2019 ). ⭐️⭐️⭐️⭐️ tricks bnneck inter-intra-class
Zheng et al. Joint Discriminative and Generative Learning for Person Re-identification. CVPR( 2019 ).
Wang et al. Spatial-Temporal Person Re-identification. CVPR( 2019 ).

NIPS

Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu. Spatial Transformer Networks. NIPS (2015) ⭐️⭐️⭐️⭐️ affine transform learnable layer

hbb1/reading-list

reading-list

Basic Network

Object detection

Visual Grounding / Referring Expressions (Images):

Person ReID

NIPS