Good deep-learning papers in 2017 IEEE Conference on Computer Vision and Pattern Recognition.
✅ [Feedback Networks]
✅ [Comparative Evaluation of Hand-Crafted and Learned Local Features]
✅ [Understanding deep learning requires rethinking generalization]
✅ [Local Binary Convolutional Neural Networks]
✅ [Deep Roots: Improving CNN Efficiency With Hierarchical Filter Groups]
✅ [Graph-Structured Representations for Visual Question Answering]
✅ [Unsupervised Video Summarization With Adversarial LSTM Networks]
✅ [A Hierarchical Approach for Generating Descriptive Image Paragraphs]
✅ [Efficient Multiple Instance Metric Learning Using Weakly Supervised Data]
✅ [Neural Scene De-rendering]
✅ [Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection]
✅ [Attend to You: Personalized Image Captioning with Context Sequence Memory Networks]
✅ [Modeling Relationships in Referential Expressions with Compositional Modular Networks]
✅ [The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions]
✅ [ViP-CNN: Visual Phrase Guided Convolutional Neural Network]
✅ [SCC: Semantic Context Cascade for Efficient Action Detection]
✅ [Hierarchical Boundary-Aware Neural Encoder for Video Captioning]
✅ [Emotion Recognition in Context]
✅ [Automatic Understanding of Image and Video Advertisements]
✅ [Person Search with Natural Language Description]
✅ [Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos]
✅ [Dense Captioning With Joint Inference and Visual Context]
✅ [Instance-Aware Image and Sentence Matching With Selective Multimodal LSTM]
✅ [Face Normals "In-The-Wild" Using Fully Convolutional Networks]
✅ [3D Face Morphable Models "In-The-Wild"]
✅ [Generating Holistic 3D Scene Abstractions for Text-Based Image Retrieval]
✅ [Unsupervised Monocular Depth Estimation With Left-Right Consistency]
✅ [Exploiting 2D Floorplan for Building-Scale Panorama RGBD Alignment]
✅ [A Point Set Generation Network for 3D Object Reconstruction From a Single Image]
✅ [Recurrent 3D Pose Sequence Machines]
✅ [Learning Detailed Face Reconstruction From a Single Image]
✅ [NID-SLAM: Robust Monocular SLAM using Normalised Information Distance]
✅ [Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks]
✅ [End-To-End Training of Hybrid CNN-CRF Models for Stereo]
✅ [Position Tracking for Virtual Reality Using Commodity WiFi]
✅ [Learning by Association -- A Versatile Semi-Supervised Training Method for Neural Networks]
✅ [Weakly Supervised Cascaded Convolutional Networks]
✅ [WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation]
✅ [Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling]
✅ [Simple Does It: Weakly Supervised Instance and Semantic Segmentation]
✅ [Few-Shot Object Recognition from Machine-Labeled Web Images]
✅ [A Graph Regularized Deep Neural Network for Unsupervised Image Representation Learning]
✅ [Deep Self-Taught Learning for Weakly Supervised Object Localization]
✅ [From Zero-Shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis]
✅ [Unsupervised Learning of Depth and Ego-Motion From Video]
✅ [Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning From Web Data]
✅ [Weakly Supervised Dense Video Captioning]
✅ [Learning a Deep Embedding Model for Zero-Shot Learning]
✅ [Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos]
✅ [Unsupervised Learning of Long-Term Motion Dynamics for Videos]
✅ [Learning From Synthetic Humans]
✅ [Learning From Noisy Large-Scale Datasets With Minimal Supervision]
✅ [Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach]
✅ [Learning to Detect Salient Objects With Image-Level Supervision]
✅ [Dual Attention Networks for Multimodal Reasoning and Matching]
✅ [Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning]
✅ [Supervising Neural Attention Models for Video Captioning by Human Gaze Data]
✅ [Deep Level Sets for Salient Object Detection]
✅ [Temporal Convolutional Networks for Action Segmentation and Detection]
✅ [One-Shot Video Object Segmentation]
✅ [Polyhedral Conic Classifiers for Visual Object Detection and Classification]
✅ [Mining Object Parts From CNNs via Active Question-Answering]
✅ [Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification]
✅ [Beyond triplet loss: a deep quadruplet network for person re-identification]
✅ [Surveillance Video Parsing with Single Frame Supervision]
✅ [Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes]
✅ [Pixelwise Instance Segmentation With a Dynamically Instantiated Network]
✅ [Video Propagation Networks]
✅ [Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks]
✅ [Self-Learning Scene-Specific Pedestrian Detectors Using a Progressive Latent Model]
✅ [IRINA: Iris Recognition (Even) in Inaccurately Segmented Data]
✅ [Forecasting Human Dynamics from Static Images]
✅ [Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition With Convolutional Neural Networks]
✅ [WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation]
✅ [PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation]
✅ [Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core]
✅ [Object Detection in Videos With Tubelet Proposal Networks]
✅ [Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling]
✅ [Forecasting Interactive Dynamics of Pedestrians with Fictitious Play]
✅ [Convolutional Random Walk Networks for Semantic Image Segmentation]
✅ [Simple Does It: Weakly Supervised Instance and Semantic Segmentation]
✅ [Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing]
✅ [Finding Tiny Faces]
✅ [Visual-Inertial-Semantic Scene Representation for 3D Object Detection]
✅ [Predictive-Corrective Networks for Action Detection]
✅ [FastMask: Segment Multi-Scale Object Candidates in One Shot]
✅ [ActionVLAD: Learning spatio-temporal aggregation for action classification]
✅ [Interpretable Structure-Evolving LSTM]
✅ [Budget-Aware Deep Semantic Video Segmentation]
✅ [Spindle Net: Person Re-Identification With Human Body Region Guided Feature Decomposition and Fusion]
✅ [Hand Keypoint Detection in Single Images using Multiview Bootstrapping]
✅ [Few-Shot Object Recognition from Machine-Labeled Web Images]
✅ [Perceptual Generative Adversarial Networks for Small Object Detection]
✅ [Weakly Supervised Actor-Action Segmentation via Robust Multi-Task Ranking]
✅ [Sequential Person Recognition in Photo Albums With a Recurrent Network]
✅ [Person Re-Identification in the Wild]
✅ [Deep Self-Taught Learning for Weakly Supervised Object Localization]
✅ [Semantic Amodal Segmentation]
✅ [Deep Sequential Context Networks for Action Prediction]
✅ [Predicting Behaviors of Basketball Players From First Person Videos]
✅ [Spatiotemporal Pyramid Network for Video Action Recognition]
✅ [Object Region Mining With Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach]
✅ [MIML-FCN+: Multi-Instance Multi-Label Learning via Fully Convolutional Networks With Privileged Information]
✅ [Global Context-Aware Attention LSTM Networks for 3D Action Recognition]
✅ [Semantic Scene Completion from a Single Depth Image]
✅ [Multi-Context Attention for Human Pose Estimation]
✅ [Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing]
✅ [RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation]
✅ [Deep Matching Prior Network: Toward Tighter Multi-Oriented Text Detection]
✅ [Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image]
✅ [Unsupervised Learning of Long-Term Motion Dynamics for Videos]
✅ [SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation]
✅ [Fully Convolutional Instance-Aware Semantic Segmentation]
✅ [DeLiGAN : Generative Adversarial Networks for Diverse and Limited Data]
✅ [Crossing Nets: Combining GANs and VAEs With a Shared Latent Space for Hand Pose Estimation]
✅ [Generating the Future with Adversarial Transformers]
✅ [Image-to-Image Translation with Conditional Adversarial Networks]
✅ [Perceptual Generative Adversarial Networks for Small Object Detection]
✅ [Disentangled Representation Learning GAN for Pose-Invariant Face Recognition]
✅ [3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images]
✅ [Learning From Simulated and Unsupervised Images Through Adversarial Training]
✅ [Expecting the Unexpected: Training Detectors for Unusual Pedestrians With Adversarial Imposters]
✅ [Deep Reinforcement Learning-Based Image Captioning With Embedding Reward]
✅ [Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection]
✅ [Collaborative Deep Reinforcement Learning for Joint Object Search]
✅ [Borrowing Treasures From the Wealthy: Deep Transfer Learning Through Selective Joint Fine-Tuning]
✅ [Learning a Deep Embedding Model for Zero-Shot Learning]
✅ [Visual Dialog]
✅ [Scene Parsing Through ADE20K Dataset]
✅ [Analyzing Computer Vision Data - The Good, the Bad and the Ugly]
✅ [Multi-Way Multi-Level Kernel Modeling for Neuroimaging Classification]
✅ [Dilated Residual Networks]
✅ [Oriented Response Networks]
✅ [PolyNet: A Pursuit of Structural Diversity in Very Deep Networks]
✅ [Spatially Adaptive Computation Time for Residual Networks]
✅ [Xception: Deep Learning With Depthwise Separable Convolutions]
✅ [Aggregated Residual Transformations for Deep Neural Networks]
✅ [Loss Max-Pooling for Semantic Image Segmentation]
✅ [Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation]
✅ [Instance-Aware Image and Sentence Matching With Selective Multimodal LSTM]
✅ [Deep Temporal Linear Encoding Networks]
✅ [Deep Feature Flow for Video Recognition]
✅ [Turning an Urban Scene Video Into a Cinemagraph]
✅ [Real-Time Neural Style Transfer for Videos]
✅ [Predicting Ground-Level Scene Layout from Aerial Imagery]
✅ [Image-to-Image Translation with Conditional Adversarial Networks]
✅ [Deep Joint Rain Detection and Removal from a Single Image]
✅ [From Red Wine to Red Tomato: Composition with Context]
✅ [StyleBank: An Explicit Representation for Neural Image Style Transfer]
✅ [Deep View Morphing]
✅ [CLKN: Cascaded Lucas-Kanade Networks for Image Alignment]
✅ [Unrolling the Shutter: CNN to Correct Motion Distortions]
✅ [Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition and Detection]
✅ [Efficient Multiple Instance Metric Learning Using Weakly Supervised Data]
✅ [Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction]
✅ [Unified Embedding and Metric Learning for Zero-Exemplar Event Detection]
✅ [Joint Discriminative Bayesian Dictionary and Classifier Learning]
✅ [Superpixel-based Tracking-by-Segmentation using Markov Chains]
✅ [iCaRL: Incremental Classifier and Representation Learning]
✅ [ShapeOdds: Variational Bayesian Learning of Generative Shape Models]
✅ [Outlier-Robust Tensor PCA]
✅ [Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution]
✅ [Attention-Aware Face Hallucination via Deep Reinforcement Learning]
✅ [Deep Video Deblurring for Hand-held Cameras]
✅ [DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks]
✅ [From Motion Blur to Motion Flow: A Deep Learning Solution for Removing Heterogeneous Motion Blur]
✅ [Deep Learning of Human Visual Sensitivity in Image Quality Assessment Framework]
✅ [Video Frame Interpolation via Adaptive Convolution]
✅ [Multi-View 3D Object Detection Network for Autonomous Driving]
✅ [End-To-End Learning of Driving Models From Large-Scale Video Datasets]
✅ [Conditional Similarity Networks]
✅ [Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search]
✅ [3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions]
✅ [Quad-networks: unsupervised learning to rank for interest point detection]
✅ [A Unified Approach of Multi-Scale Deep and Hand-Crafted Features for Defocus Estimation]
✅ [SRN: Side-output Residual Network for Object Symmetry Detection in the Wild]
✅ [Learning Deep Binary Descriptor With Multi-Quantization]
✅ [Learning Non-Lambertian Object Intrinsics Across ShapeNet Categories]
✅ [Efficient Diffusion on Region Manifolds: Recovering Small Objects With Compact CNN Representations]
✅ [Learned Contextual Feature Reweighting for Image Geo-Localization]
✅ [ER3: A Unified Framework for Event Retrieval, Recognition and Recounting]
✅ [Fast Fourier Color Constancy]
✅ [A Practical Method for Fully Automatic Intrinsic Camera Calibration Using Directionally Encoded Light]
✅ [Designing illuminant spectral power distributions for surface classification]
✅ [Accurate Optical Flow via Direct Cost Volume Processing]
✅ [Asymmetric Feature Maps With Application to Sketch Based Retrieval]
✅ [KillingFusion: Non-rigid 3D Reconstruction without Correspondences]
✅ [Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?]
✅ [On the Effectiveness of Visible Watermarks]