CVPR 2021 论文和开源项目合集(Papers with Code)
CVPR 2021 论文和开源项目合集(papers with code)!
CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt
注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
CVPR 2021 中奖群已成立!已经收录的同学,可以添加微信:CVer9999,请备注:CVPR2021已收录+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群沟通开会等事宜。
【CVPR 2021 论文开源目录】
- Backbone
- NAS
- GAN
- VAE
- Visual Transformer
- Regularization
- 无监督/自监督(Self-Supervised)
- 半监督(Semi-Supervised)
- 2D目标检测(Object Detection)
- 单/多目标跟踪(Object Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像分割(Medical Image Segmentation)
- 交互式视频目标分割(Interactive-Video-Object-Segmentation)
- 显著性检测(Saliency Detection)
- 行人搜索(Person Search)
- 视频理解/行为识别(Video Understanding)
- 人脸识别(Face Recognition)
- 人脸检测(Face Detection)
- 人脸活体检测(Face Anti-Spoofing)
- Deepfake检测(Deepfake Detection)
- 人脸年龄估计(Age-Estimation)
- 人脸表情识别(Facial-Expression-Recognition)
- 人体解析(Human Parsing)
- 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
- 场景文本识别(Scene Text Recognition)
- 模型压缩/剪枝/量化
- 超分辨率(Super-Resolution)
- 图像恢复(Image Restoration)
- 图像补全(Image Inpainting)
- 图像编辑(Image Editing)
- 反光去除(Reflection Removal)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D点云配准(3D Point Cloud Registration)
- 3D点云补全(3D-Point-Cloud-Completion)
- 6D位姿估计(6D Pose Estimation)
- 相机姿态估计(Camera Pose Estimation)
- 深度估计(Depth Estimation)
- 对抗样本(Adversarial-Examples)
- 图像检索(Image Retrieval)
- 视频检索(Video Retrieval)
- 跨模态检索(Cross-modal Retrieval)
- Zero-Shot Learning
- 联邦学习(Federated Learning)
- 视频插帧(Video Frame Interpolation)
- 视觉推理(Visual Reasoning)
- 视图合成(Visual Synthesis)
- Domain Generalization
- "人-物"交互(HOI)检测
- 阴影去除(Shadow Removal)
- 虚拟试衣
- 数据集(Datasets)
- 其他(Others)
- 待添加(TODO)
- 不确定中没中(Not Sure)
Backbone
Diverse Branch Block: Building a Convolution as an Inception-like Unit
Scaling Local Self-Attention For Parameter Efficient Visual Backbones
-
Paper(Oral): https://arxiv.org/abs/2103.12731
-
Code: None
ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
Involution: Inverting the Inherence of Convolution for Visual Recognition
Coordinate Attention for Efficient Mobile Network Design
Inception Convolution with Efficient Dilation Search
RepVGG: Making VGG-style ConvNets Great Again
NAS
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
- Paper(Oral): None
- Code: https://github.com/dingmyu/HR-NAS
Neural Architecture Search with Random Labels
- Paper: https://arxiv.org/abs/2101.11834
- Code: None
Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search
- Paper: https://arxiv.org/abs/2101.11342
- Code: None
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
- Paper: None
- Code: None
Prioritized Architecture Sampling with Monto-Carlo Tree Search
Contrastive Neural Architecture Search with Neural Architecture Comparators
AttentiveNAS: Improving Neural Architecture Search via Attentive
- Paper: https://arxiv.org/abs/2011.09011
- Code: None
ReNAS: Relativistic Evaluation of Neural Architecture Search
- Paper: https://arxiv.org/abs/1910.01523
- Code: None
HourNAS: Extremely Fast Neural Architecture
- Paper: https://arxiv.org/abs/2005.14446
- Code: None
Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
Inception Convolution with Efficient Dilation Search
- Paper: https://arxiv.org/abs/2012.13587
- Code: None
GAN
TediGAN: Text-Guided Diverse Image Generation and Manipulation
-
Homepage: https://xiaweihao.com/projects/tedigan/
Generative Hierarchical Features from Synthesizing Image
-
Homepage: https://genforce.github.io/ghfeat/
-
Paper(Oral): https://arxiv.org/abs/2007.10379
Teachers Do More Than Teach: Compressing Image-to-Image Models
HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms
pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
-
Paper(Oral): https://arxiv.org/abs/2012.00926
-
Code: None
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
- Paper: https://arxiv.org/abs/2103.07893
- Code: None
Diverse Semantic Image Synthesis via Probability Distribution Modeling
LOHO: Latent Optimization of Hairstyles via Orthogonalization
- Paper: https://arxiv.org/abs/2103.03891
- Code: None
PISE: Person Image Synthesis and Editing with Decoupled GAN
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
Efficient Conditional GAN Transfer with Knowledge Propagation across Classes
- Paper: https://www.researchgate.net/publication/349309756_Efficient_Conditional_GAN_Transfer_with_Knowledge_Propagation_across_Classes
- Code: http://github.com/mshahbazi72/cGANTransfer
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: None
- Code: None
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs
- Paper: https://arxiv.org/abs/2011.14107
- Code: None
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
- Homepage: https://eladrich.github.io/pixel2style2pixel/
- Paper: https://arxiv.org/abs/2008.00951
- Code: https://github.com/eladrich/pixel2style2pixel
A 3D GAN for Improved Large-pose Facial Recognition
- Paper: https://arxiv.org/abs/2012.10545
- Code: None
HumanGAN: A Generative Model of Humans Images
- Paper: https://arxiv.org/abs/2103.06902
- Code: None
ID-Unet: Iterative Soft and Hard Deformation for View Synthesis
CoMoGAN: continuous model-guided image-to-image translation
- Paper(Oral): https://arxiv.org/abs/2103.06879
- Code: https://github.com/cv-rits/CoMoGAN
Training Generative Adversarial Networks in One Stage
- Paper: https://arxiv.org/abs/2103.00430
- Code: None
Closed-Form Factorization of Latent Semantics in GANs
- Homepage: https://genforce.github.io/sefa/
- Paper(Oral): https://arxiv.org/abs/2007.06600
- Code: https://github.com/genforce/sefa
Anycost GANs for Interactive Image Synthesis and Editing
Image-to-image Translation via Hierarchical Style Disentanglement
VAE
Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders
Visual Transformer
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
- Paper(Oral): None
- Code: https://github.com/dingmyu/HR-NAS
MIST: Multiple Instance Spatial Transformer Network
- Paper: https://arxiv.org/abs/1811.10725
- Code: None
Multimodal Motion Prediction with Stacked Transformers
Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
-
Paper(Oral): https://arxiv.org/abs/2103.11681
Pre-Trained Image Processing Transformer
- Paper: https://arxiv.org/abs/2012.00364
- Code: None
End-to-End Video Instance Segmentation with Transformers
- Paper(Oral): https://arxiv.org/abs/2011.14503
- Code: https://github.com/Epiphqny/VisTR
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
- Paper(Oral): https://arxiv.org/abs/2011.09094
- Code: https://github.com/dddzg/up-detr
End-to-End Human Object Interaction Detection with HOI Transformer
Transformer Interpretability Beyond Attention Visualization
- Paper: https://arxiv.org/abs/2012.09838
- Code: https://github.com/hila-chefer/Transformer-Explainability
Regularization
Regularizing Neural Networks via Adversarial Model Perturbation
无监督/自监督(Un/Self-Supervised)
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
- Paper: https://arxiv.org/abs/2009.05769
- Code: https://github.com/FingerRec/BE
Spatially Consistent Representation Learning
- Paper: https://arxiv.org/abs/2103.06122
- Code: None
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
Exploring Simple Siamese Representation Learning
- Paper(Oral): https://arxiv.org/abs/2011.10566
- Code: None
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
- Paper(Oral): https://arxiv.org/abs/2011.09157
- Code: https://github.com/WXinlong/DenseCL
半监督学习(Semi-Supervised )
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
- Paper: https://arxiv.org/abs/2103.11402
- Code: None
Adaptive Consistency Regularization for Semi-Supervised Transfer Learning
- Paper: https://arxiv.org/abs/2103.02193
- Code: https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning
2D目标检测(Object Detection)
2D目标检测
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Homepage: https://rl.uni-freiburg.de/
- Paper: https://arxiv.org/abs/2103.01353
- Code: None
Positive-Unlabeled Data Purification in the Wild for Object Detection
- Paper: None
- Code: None
Instance Localization for Self-supervised Detection Pretraining
MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
- Paper: https://arxiv.org/abs/2103.04224
- Code: None
End-to-End Object Detection with Fully Convolutional Network
Robust and Accurate Object Detection via Adversarial Learning
-
Code: None
I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
- Paper: https://arxiv.org/abs/2103.13757
- Code: None
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
- Paper: https://arxiv.org/abs/2103.11402
- Code: None
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
YOLOF:You Only Look One-level Feature
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
- Paper(Oral): https://arxiv.org/abs/2011.09094
- Code: https://github.com/dddzg/up-detr
General Instance Distillation for Object Detection
- Paper: https://arxiv.org/abs/2103.02340
- Code: None
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
Multiple Instance Active Learning for Object Detection
Towards Open World Object Detection
- Paper(Oral): https://arxiv.org/abs/2103.02603
- Code: https://github.com/JosephKJ/OWOD
Few-Shot目标检测
Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
- Paper: https://arxiv.org/abs/2103.01903
- Code: None
Few-Shot Object Detection via Contrastive Proposal Encoding
旋转目标检测
ReDet: A Rotation-equivariant Detector for Aerial Object Detection
单/多目标跟踪(Object Tracking)
单目标跟踪
Graph Attention Tracking
Rotation Equivariant Siamese Networks for Tracking
- Paper: https://arxiv.org/abs/2012.13078
- Code: None
Track to Detect and Segment: An Online Multi-Object Tracker
- Homepage: https://jialianwu.com/projects/TraDeS.html
- Paper: None
- Code: None
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
-
Paper(Oral): https://arxiv.org/abs/2103.11681
TransT - Transformer Tracking
- Paper: None
- Code: https://github.com/chenxin-dlut/TransT
多目标跟踪
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
- Paper: https://arxiv.org/abs/2012.02337
- Code: None
Learning a Proposal Classifier for Multiple Object Tracking
Track to Detect and Segment: An Online Multi-Object Tracker
- Homepage: https://jialianwu.com/projects/TraDeS.html
- Paper: https://arxiv.org/abs/2103.08808
- Code: https://github.com/JialianW/TraDeS
语义分割(Semantic Segmentation)
Cross-Dataset Collaborative Learning for Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.11351
- Code: None
Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
- Paper: https://arxiv.org/abs/2103.06342
- Code: None
Capturing Omni-Range Context for Omnidirectional Segmentation
- Paper: https://arxiv.org/abs/2103.05687
- Code: None
Learning Statistical Texture for Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.04133
- Code: None
PLOP: Learning without Forgetting for Continual Semantic Segmentation
- Paper: https://arxiv.org/abs/2011.11390
- Code: None
弱监督语义分割
BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
- Paper: https://arxiv.org/abs/2103.08907
- Code: None
半监督语义分割
Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
域自适应语义分割
Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
- Paper: https://arxiv.org/abs/2103.13041
- Code: None
MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.05254
- Code: None
Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.04717
- Code: None
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
实例分割(Instance Segmentation)
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
End-to-End Video Instance Segmentation with Transformers
- Paper(Oral): https://arxiv.org/abs/2011.14503
- Code: https://github.com/Epiphqny/VisTR
Zero-shot instance segmentation(Not Sure)
- Paper: None
- Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395
全景分割(Panoptic Segmentation)
Fully Convolutional Networks for Panoptic Segmentation
Cross-View Regularization for Domain Adaptive Panoptic Segmentation
- Paper: https://arxiv.org/abs/2103.02584
- Code: None
医学图像分割
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
交互式视频目标分割(Interactive-Video-Object-Segmentation)
Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
显著性检测(Saliency Detection)
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion
- Paper(Oral): https://arxiv.org/abs/2103.11832
- Code: https://github.com/sunpeng1996/DSA2F
行人搜索(Person Search)
Anchor-Free Person Search
- Paper: https://arxiv.org/abs/2103.11617
- Code: https://github.com/daodaofr/AlignPS
- Interpretation: 首个无需锚框(Anchor-Free)的行人搜索框架 | CVPR 2021
视频理解/行为识别(Video Understanding)
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
- Paper: https://arxiv.org/abs/2103.13137
- Code: None
Temporal Context Aggregation Network for Temporal Action Proposal Refinement
- Paper: https://arxiv.org/abs/2103.13141
- Code: None
- Interpretation: CVPR 2021 | TCANet:最强时序动作提名修正网络
ACTION-Net: Multipath Excitation for Action Recognition
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
- Paper: https://arxiv.org/abs/2009.05769
- Code: https://github.com/FingerRec/BE
TDN: Temporal Difference Networks for Efficient Action Recognition
人脸识别(Face Recognition)
A 3D GAN for Improved Large-pose Facial Recognition
- Paper: https://arxiv.org/abs/2012.10545
- Code: None
MagFace: A Universal Representation for Face Recognition and Quality Assessment
- Paper(Oral): https://arxiv.org/abs/2103.06627
- Code: https://github.com/IrvingMeng/MagFace
WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
- Homepage: https://www.face-benchmark.org/
- Paper: https://arxiv.org/abs/2103.04098
- Dataset: https://www.face-benchmark.org/
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
- Paper(Oral): https://arxiv.org/abs/2103.01520
- Code: https://github.com/Hzzone/MTLFace
- Dataset: https://github.com/Hzzone/MTLFace
人脸检测(Face Detection)
CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
- Paper: https://arxiv.org/abs/2103.07017
- Code: None
人脸活体检测(Face Anti-Spoofing)
Cross Modal Focal Loss for RGBD Face Anti-Spoofing
- Paper: https://arxiv.org/abs/2103.00948
- Code: None
Deepfake检测(Deepfake Detection)
Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
- Paper:https://arxiv.org/abs/2103.01856
- Code: None
Multi-attentional Deepfake Detection
- Paper:https://arxiv.org/abs/2103.02406
- Code: None
人脸年龄估计(Age Estimation)
PML: Progressive Margin Loss for Long-tailed Age Classification
- Paper: https://arxiv.org/abs/2103.02140
- Code: None
人脸表情识别(Facial Expression Recognition)
Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
- Paper: https://arxiv.org/abs/2103.13372
- Code: None
人体解析(Human Parsing)
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
2D 人体姿态估计
DCPose: Deep Dual Consecutive Network for Human Pose Estimation
3D 人体姿态估计
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation
- Homepage: https://jeffli.site/HybrIK/
- Paper: https://arxiv.org/abs/2011.14672
- Code: https://github.com/Jeff-sjtu/HybrIK
场景文本识别(Scene Text Recognition)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
模型压缩/剪枝/量化
Teachers Do More Than Teach: Compressing Image-to-Image Models
模型剪枝
Dynamic Slimmable Network
模型量化
Learnable Companding Quantization for Accurate Low-bit Neural Networks
- Paper: https://arxiv.org/abs/2103.07156
- Code: None
超分辨率(Super-Resolution)
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
AdderSR: Towards Energy Efficient Image Super-Resolution
- Paper: https://arxiv.org/abs/2009.08891
- Code: None
视频超分辨率
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
- Paper: None
- Code: https://github.com/CS-GangXu/TMNet
图像恢复(Image Restoration)
Multi-Stage Progressive Image Restoration
图像补全(Image Inpainting)
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
图像编辑(Image Editing)
Anycost GANs for Interactive Image Synthesis and Editing
PISE: Person Image Synthesis and Editing with Decoupled GAN
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: None
- Code: None
反光去除(Reflection Removal)
Robust Reflection Removal with Reflection-free Flash-only Cues
- Paper: https://arxiv.org/abs/2103.04273
- Code: https://github.com/ChenyangLEI/flash-reflection-removal
3D目标检测(3D Object Detection)
M3DSSD: Monocular 3D Single Stage Object Detector
SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud
- Paper: None
- Code: https://github.com/Vegeta2020/SE-SSD
Center-based 3D Object Detection and Tracking
Categorical Depth Distribution Network for Monocular 3D Object Detection
- Paper: https://arxiv.org/abs/2103.01100
- Code: None
3D语义分割(3D Semantic Segmentation)
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
- Homepage: https://github.com/QingyongHu/SensatUrban
- Paper: http://arxiv.org/abs/2009.03137
- Code: https://github.com/QingyongHu/SensatUrban
- Dataset: https://github.com/QingyongHu/SensatUrban
3D目标跟踪(3D Object Trancking)
Center-based 3D Object Detection and Tracking
3D点云配准(3D Point Cloud Registration)
PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency
PREDATOR: Registration of 3D Point Clouds with Low Overlap
3D点云补全(3D Point Cloud Completion)
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
- Paper: https://arxiv.org/abs/2103.02535
- Code: None
6D位姿估计(6D Pose Estimation)
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
- Paper(Oral): https://arxiv.org/abs/2103.07054
- Code: https://github.com/DC1991/FS-Net
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
- Paper: http://arxiv.org/abs/2102.12145
- code: https://git.io/GDR-Net
FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
相机姿态估计
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
深度估计
Beyond Image to Depth: Improving Depth Prediction using Echoes
- Homepage: https://krantiparida.github.io/projects/bimgdepth.html
- Paper: https://arxiv.org/abs/2103.08468
- Code: https://github.com/krantiparida/beyond-image-to-depth
S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
- Paper: https://arxiv.org/abs/2103.02396
- Code: None
Depth from Camera Motion and Object Detection
- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD
对抗样本
Natural Adversarial Examples
图像检索(Image Retrieval)
QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval
- Paper: https://arxiv.org/abs/2103.02927
- Code: None
视频检索(Video Retrieval)
On Semantic Similarity in Video Retrieval
-
Homepage: https://mwray.github.io/SSVR/
跨模态检索(Cross-modal Retrieval)
Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Zero-Shot Learning
Counterfactual Zero-Shot and Open-Set Visual Recognition
联邦学习(Federated Learning)
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
视频插帧(Video Frame Interpolation)
CDFI: Compression-Driven Network Design for Frame Interpolation
- Paper: None
- Code: https://github.com/tding1/CDFI
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
-
Homepage: https://tarun005.github.io/FLAVR/
视觉推理(Visual Reasoning)
Transformation Driven Visual Reasoning
- homepage: https://hongxin2019.github.io/TVR/
- Paper: https://arxiv.org/abs/2011.13160
- Code: https://github.com/hughplay/TVR
视图合成(View Synthesis)
NeX: Real-time View Synthesis with Neural Basis Expansion
- Homepage: https://nex-mpi.github.io/
- Paper(Oral): https://arxiv.org/abs/2103.05606
DomainGeneralization
FSDR: Frequency Space Domain Randomization for Domain Generalization
- Paper: https://arxiv.org/abs/2103.02370
- Code: None
"人-物"交互(HOI)检测
Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
Reformulating HOI Detection as Adaptive Set Prediction
Detecting Human-Object Interaction via Fabricated Compositional Learning
End-to-End Human Object Interaction Detection with HOI Transformer
阴影去除(Shadow Removal)
Auto-Exposure Fusion for Single-Image Shadow Removal
- Paper: https://arxiv.org/abs/2103.01255
- Code: https://github.com/tsingqguo/exposure-fusion-shadow-removal
虚拟换衣(Virtual Try-On)
Parser-Free Virtual Try-on via Distilling Appearance Flows
基于外观流蒸馏的无需人体解析的虚拟换装
数据集(Datasets)
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
- Homepage: https://vap.aau.dk/sewer-ml/
- Paper: https://arxiv.org/abs/2103.10619
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
-
Homepage: https://vap.aau.dk/sewer-ml/
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
- Paper: https://arxiv.org/abs/2103.03375
- Dataset: None
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
- Homepage: https://github.com/QingyongHu/SensatUrban
- Paper: http://arxiv.org/abs/2009.03137
- Code: https://github.com/QingyongHu/SensatUrban
- Dataset: https://github.com/QingyongHu/SensatUrban
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
- Paper(Oral): https://arxiv.org/abs/2103.01520
- Code: https://github.com/Hzzone/MTLFace
- Dataset: https://github.com/Hzzone/MTLFace
Depth from Camera Motion and Object Detection
- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
- Dataset: http://rl.uni-freiburg.de/research/multimodal-distill
其他(Others)
Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks
- Homepage: https://paschalidoud.github.io/neural_parts
- Paper: None
- Code: https://github.com/paschalidoud/neural_parts
Knowledge Evolution in Neural Networks
- Paper(Oral): https://arxiv.org/abs/2103.05152
- Code: https://github.com/ahmdtaha/knowledge_evolution
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
SGP: Self-supervised Geometric Perception
-
Oral
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
Diffusion Probabilistic Models for 3D Point Cloud Generation
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
待添加(TODO)
不确定中没中(Not Sure)
CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models
- Paper: none
- Code: https://github.com/transcendentsky/Film-Recovery
Toward Explainable Reflection Removal with Distilling and Model Uncertainty
- Paper: none
- Code: https://github.com/ytpeng-aimlab/CVPR-2021-Toward-Explainable-Reflection-Removal-with-Distilling-and-Model-Uncertainty
DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation
- Paper: none
- Code: https://github.com/lhaippp/DeepOIS
Exploring Adversarial Fake Images on Face Manifold
- Paper: none
- Code: https://github.com/ldz666666/Style-atk
Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task
- Paper: none
- Code: https://github.com/yandamengdanai/Uncertainty-Aware-Semi-Supervised-Crowd-Counting-via-Consistency-Regularized-Surrogate-Task
Temporal Contrastive Graph for Self-supervised Video Representation Learning
- Paper: none
- Code: https://github.com/YangLiu9208/TCG
Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching
- Paper: none
- Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr
Fast and Memory-Efficient Compact Bilinear Pooling
- Paper: none
- Code: https://github.com/cvpr2021kp2/cvpr2021kp2
Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine
- Paper: none
- Code: https://github.com/gapDetection/cvpr2021
Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation
- Paper: none
- Code: https://github.com/interactivekeypoint2020/Morph
https://github.com/ShaoQiangShen/CVPR2021
https://github.com/gillesflash/CVPR2021
https://github.com/anonymous-submission1991/BaLeNAS
https://github.com/cvpr2021dcb/cvpr2021dcb
https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578
https://github.com/AldrichZeng/FreqPrune