CVPR 2022 论文和开源项目合集(Papers with Code)

CVPR 2022 论文和开源项目合集(papers with code)!

CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view

注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

【CVPR 2022 论文开源目录】

Backbone

A ConvNet for the 2020s

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

MPViT : Multi-Path Vision Transformer for Dense Prediction

Mobile-Former: Bridging MobileNet and Transformer

MetaFormer is Actually What You Need for Vision

Shunted Self-Attention via Multi-Scale Token Aggregation

TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing

Learned Queries for Efficient Local Attention

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality

CLIP

HairCLIP: Design Your Hair by Text and Reference Image

PointCLIP: Point Cloud Understanding by CLIP

Blended Diffusion for Text-driven Editing of Natural Images

GAN

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Style Transformer for Image Inversion and Editing

Unsupervised Image-to-Image Translation with Generative Prior

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

OSSGAN: Open-set Semi-supervised Image Generation

Neural Texture Extraction and Distribution for Controllable Person Image Synthesis

GNN

OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks

MLP

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality

NAS

β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

OCR

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

NeRF

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Point-NeRF: Point-based Neural Radiance Fields

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images

Urban Radiance Fields

Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

3D Face

ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations

长尾分布(Long-Tail)

Retrieval Augmented Classification for Long-Tail Visual Recognition

Visual Transformer

Backbone

MPViT : Multi-Path Vision Transformer for Dense Prediction

MetaFormer is Actually What You Need for Vision

Mobile-Former: Bridging MobileNet and Transformer

Shunted Self-Attention via Multi-Scale Token Aggregation

Learned Queries for Efficient Local Attention

应用(Application)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Embracing Single Stride 3D Object Detector with Sparse Transformer

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Spatio-temporal Relation Modeling for Few-shot Action Recognition

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

GroupViT: Semantic Segmentation Emerges from Text Supervision

Restormer: Efficient Transformer for High-Resolution Image Restoration

Splicing ViT Features for Semantic Appearance Transfer

Self-supervised Video Transformer

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

Accelerating DETR Convergence via Semantic-Aligned Matching

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Style Transformer for Image Inversion and Editing

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Mask Transfiner for High-Quality Instance Segmentation

Language as Queries for Referring Video Object Segmentation

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

AdaMixer: A Fast-Converging Query-Based Object Detector

Omni-DETR: Omni-Supervised Object Detection with Transformers

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Collaborative Transformers for Grounded Situation Recognition

NFormer: Robust Person Re-identification with Neighbor Transformer

Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

A New Dataset and Transformer for Stereoscopic Video Super-Resolution

Safe Self-Refinement for Transformer-based Domain Adaptation

Fast Point Transformer

Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Stratified Transformer for 3D Point Cloud Segmentation

视觉和语言(Vision-Language)

Conditional Prompt Learning for Vision-Language Models

Bridging Video-text Retrieval with Multiple Choice Question

Visual Abductive Reasoning

自监督学习(Self-supervised Learning)

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

Crafting Better Contrastive Views for Siamese Representation Learning

HCSC: Hierarchical Contrastive Selective Coding

DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis

数据增强(Data Augmentation)

TeachAugment: Data Augmentation Optimization Using Teacher Knowledge

AlignMixup: Improving Representations By Interpolating Aligned Features

知识蒸馏(Knowledge Distillation)

Decoupled Knowledge Distillation

目标检测(Object Detection)

BoxeR: Box-Attention for 2D and 3D Transformers

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

Accelerating DETR Convergence via Semantic-Aligned Matching

Localization Distillation for Dense Object Detection

Focal and Global Knowledge Distillation for Detectors

A Dual Weighting Label Assignment Scheme for Object Detection

AdaMixer: A Fast-Converging Query-Based Object Detector

Omni-DETR: Omni-Supervised Object Detection with Transformers

SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection

半监督目标检测

Dense Learning based Semi-Supervised Object Detection

目标跟踪(Visual Tracking)

Correlation-Aware Deep Tracking

TCTrack: Temporal Contexts for Aerial Tracking

多模态目标跟踪

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

多目标跟踪(Multi-Object Tracking)

Learning of Global Objective for Network Flow in Multi-Object Tracking

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

语义分割(Semantic Segmentation)

Novel Class Discovery in Semantic Segmentation

Deep Hierarchical Semantic Segmentation

Rethinking Semantic Segmentation: A Prototype View

弱监督语义分割

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers

CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation

CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation

FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation

Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

半监督语义分割

ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

域自适应语义分割

Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

无监督语义分割

GroupViT: Semantic Segmentation Emerges from Text Supervision

少样本语义分割

Generalized Few-shot Semantic Segmentation

实例分割(Instance Segmentation)

BoxeR: Box-Attention for 2D and 3D Transformers

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

Mask Transfiner for High-Quality Instance Segmentation

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

自监督实例分割

FreeSOLO: Learning to Segment Objects without Annotations

视频实例分割

Efficient Video Instance Segmentation via Tracklet Query and Proposal

Temporally Efficient Vision Transformer for Video Instance Segmentation

全景分割(Panoptic Segmentation)

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Large-scale Video Panoptic Segmentation in the Wild: A Benchmark

小样本分类(Few-Shot Classification)

Integrative Few-Shot Learning for Classification and Segmentation

Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification

小样本分割(Few-Shot Segmentation)

Learning What Not to Segment: A New Perspective on Few-Shot Segmentation

Integrative Few-Shot Learning for Classification and Segmentation

Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation

图像抠图(Image Matting)

Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation

视频理解(Video Understanding)

Self-supervised Video Transformer

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition

行为识别(Action Recognition)

Spatio-temporal Relation Modeling for Few-shot Action Recognition

动作检测(Action Detection)

End-to-End Semi-Supervised Learning for Video Action Detection

图像编辑(Image Editing)

Style Transformer for Image Inversion and Editing

Blended Diffusion for Text-driven Editing of Natural Images

SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Low-level Vision

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Restormer: Efficient Transformer for High-Resolution Image Restoration

Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements

超分辨率(Super-Resolution)

图像超分辨率(Image Super-Resolution)

Learning the Degradation Distribution for Blind Image Super-Resolution

视频超分辨率(Video Super-Resolution)

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling

A New Dataset and Transformer for Stereoscopic Video Super-Resolution

去模糊(Deblur)

图像去模糊(Image Deblur)

Learning to Deblur using Light Field Generated and Real Defocus Images

3D点云(3D Point Cloud)

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

A Unified Query-based Paradigm for Point Cloud Understanding

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding

PointCLIP: Point Cloud Understanding by CLIP

Fast Point Transformer

RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution

3D目标检测(3D Object Detection)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds

BoxeR: Box-Attention for 2D and 3D Transformers

Embracing Single Stride 3D Object Detector with Sparse Transformer

Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

HyperDet3D: Learning a Scene-conditioned 3D Object Detector

OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

3D语义分割(3D Semantic Segmentation)

Scribble-Supervised LiDAR Semantic Segmentation

Stratified Transformer for 3D Point Cloud Segmentation

3D实例分割(3D Instance Segmentation)

Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

3D目标跟踪(3D Object Tracking)

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

3D人体姿态估计(3D Human Pose Estimation)

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation

BEV: Putting People in their Place: Monocular Regression of 3D People in Depth

3D语义场景补全(3D Semantic Scene Completion)

MonoScene: Monocular 3D Semantic Scene Completion

3D重建(3D Reconstruction)

BANMo: Building Animatable 3D Neural Models from Many Casual Videos

行人重识别(Person Re-identification)

NFormer: Robust Person Re-identification with Neighbor Transformer

伪装物体检测(Camouflaged Object Detection)

Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection

深度估计(Depth Estimation)

单目深度估计

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior

Multi-Frame Self-Supervised Depth with Transformers

立体匹配(Stereo Matching)

ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching

特征匹配(Feature Matching)

ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching

车道线检测(Lane Detection)

Rethinking Efficient Lane Detection via Curve Modeling

A Keypoint-based Global Association Network for Lane Detection

光流估计(Optical Flow Estimation)

Imposing Consistency for Optical Flow Estimation

Deep Equilibrium Optical Flow Estimation

GMFlow: Learning Optical Flow via Global Matching

图像修复(Image Inpainting)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding

图像检索(Image Retrieval)

Correlation Verification for Image Retrieval

人脸识别(Face Recognition)

AdaFace: Quality Adaptive Margin for Face Recognition

人群计数(Crowd Counting)

Leveraging Self-Supervision for Cross-Domain Crowd Counting

医学图像(Medical Image)

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification

DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis

视频生成(Video Generation)

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

场景图生成(Scene Graph Generation)

SGTR: End-to-end Scene Graph Generation with Transformer

参考视频目标分割(Referring Video Object Segmentation)

Language as Queries for Referring Video Object Segmentation

ReSTR: Convolution-free Referring Image Segmentation Using Transformers

步态识别(Gait Recognition)

Gait Recognition in the Wild with Dense 3D Representations and A Benchmark

风格迁移(Style Transfer)

StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions

异常检测(Anomaly Detection)

UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection

Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection

对抗样本)

对抗样本(Adversarial Examples)

Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon

LAS-AT: Adversarial Training with Learnable Attack Strategy

Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection

弱监督物体检测(Weakly Supervised Object Localization)

Weakly Supervised Object Localization as Domain Adaption

雷达目标检测(Radar Object Detection)

Exploiting Temporal Relations on Radar Perception for Autonomous Driving

高光谱图像重建(Hyperspectral Image Reconstruction)

Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction

图像拼接(Image Stitching)

Deep Rectangling for Image Stitching: A Learning Baseline

水印(Watermarking)

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings

Action Counting

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

Grounded Situation Recognition

Collaborative Transformers for Grounded Situation Recognition

Zero-shot Learning

Unseen Classes at a Later Time? No Problem

DeepFakes

Detecting Deepfakes with Self-Blended Images

数据集(Datasets)

It's About Time: Analog Clock Reading in the Wild

Toward Practical Self-Supervised Monocular Indoor Depth Estimation

Kubric: A scalable dataset generator

Scribble-Supervised LiDAR Semantic Segmentation

Deep Rectangling for Image Stitching: A Learning Baseline

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Shape from Polarization for Complex Scenes in the Wild

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

Aesthetic Text Logo Synthesis via Content-aware Layout Inferring

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection

A New Dataset and Transformer for Stereoscopic Video Super-Resolution

Putting People in their Place: Monocular Regression of 3D People in Depth

UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

Visual Abductive Reasoning

Large-scale Video Panoptic Segmentation in the Wild: A Benchmark

Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

新任务(New Task)

Language-based Video Editing via Multi-Modal Multi-Level Transformer

It's About Time: Analog Clock Reading in the Wild

Splicing ViT Features for Semantic Appearance Transfer

Visual Abductive Reasoning

其他(Others)

Kubric: A scalable dataset generator

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Balanced MSE for Imbalanced Visual Regression

SNUG: Self-Supervised Neural Dynamic Garments

Shape from Polarization for Complex Scenes in the Wild

LASER: LAtent SpacE Rendering for 2D Visual Localization

Single-Photon Structured Light

3DeformRS: Certifying Spatial Deformations on Point Clouds

Aesthetic Text Logo Synthesis via Content-aware Layout Inferring

Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Robust and Accurate Superquadric Recovery: a Probabilistic Approach

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

DeepDPM: Deep Clustering With an Unknown Number of Clusters

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Proto2Proto: Can you recognize the car, the way I do?

Putting People in their Place: Monocular Regression of 3D People in Depth

Light Field Neural Rendering

Neural Texture Extraction and Distribution for Controllable Person Image Synthesis

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning