CVPR 2022 论文和开源项目合集(Papers with Code)
CVPR 2022 论文和开源项目合集(papers with code)!
CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view
注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
【CVPR 2022 论文开源目录】
- Backbone
- CLIP
- GAN
- GNN
- MLP
- NAS
- OCR
- NeRF
- [3D Face](#3D Face)
- 长尾分布(Long-Tail)
- Visual Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 知识蒸馏(Knowledge Distillation)
- 目标检测(Object Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 小样本分类(Few-Shot Classification)
- 小样本分割(Few-Shot Segmentation)
- 图像抠图(Image Matting)
- 视频理解(Video Understanding)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去模糊(Deblur)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D重建(3D Reconstruction)
- 行人重识别(Person Re-identification)
- 伪装物体检测(Camouflaged Object Detection)
- 深度估计(Depth Estimation)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 车道线检测(Lane Detection)
- 光流估计(Optical Flow Estimation)
- 图像修复(Image Inpainting)
- 图像检索(Image Retrieval)
- 人脸识别(Face Recognition)
- 人群计数(Crowd Counting)
- 医学图像(Medical Image)
- [视频生成(Video Generation)](#Video Generation)
- 场景图生成(Scene Graph Generation)
- 参考视频目标分割(Referring Video Object Segmentation)
- 步态识别(Gait Recognition)
- 风格迁移(Style Transfer)
- 异常检测(Anomaly Detection
- 对抗样本(Adversarial Examples)
- 弱监督物体检测(Weakly Supervised Object Localization)
- 雷达目标检测(Radar Object Detection)
- 高光谱图像重建(Hyperspectral Image Reconstruction)
- 图像拼接(Image Stitching)
- 水印(Watermarking)
- Action Counting
- Grounded Situation Recognition
- Zero-shot Learning
- DeepFakes
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
Backbone
A ConvNet for the 2020s
- Paper: https://arxiv.org/abs/2201.03545
- Code: https://github.com/facebookresearch/ConvNeXt
- 中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
MPViT : Multi-Path Vision Transformer for Dense Prediction
- Paper: https://arxiv.org/abs/2112.11010
- Code: https://github.com/youngwanLEE/MPViT
- 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg
Mobile-Former: Bridging MobileNet and Transformer
- Paper: https://arxiv.org/abs/2108.05895
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
MetaFormer is Actually What You Need for Vision
Shunted Self-Attention via Multi-Scale Token Aggregation
- Paper(Oral): https://arxiv.org/abs/2111.15193
- Code: https://github.com/OliverRensu/Shunted-Transformer
TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing
Learned Queries for Efficient Local Attention
- Paper(Oral): https://arxiv.org/abs/2112.11435
- Code: https://github.com/moabarar/qna
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
CLIP
HairCLIP: Design Your Hair by Text and Reference Image
PointCLIP: Point Cloud Understanding by CLIP
Blended Diffusion for Text-driven Editing of Natural Images
GAN
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
-
Homepage: https://semanticstylegan.github.io/
Style Transformer for Image Inversion and Editing
Unsupervised Image-to-Image Translation with Generative Prior
- Homepage: https://www.mmlab-ntu.com/project/gpunit/
- Paper: https://arxiv.org/abs/2204.03641
- Code: https://github.com/williamyang1991/GP-UNIT
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
- Homepage: https://universome.github.io/stylegan-v
- Paper: https://arxiv.org/abs/2112.14683
- Code: https://github.com/universome/stylegan-v
OSSGAN: Open-set Semi-supervised Image Generation
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
- Paper: https://arxiv.org/abs/2204.06160
- Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution
GNN
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks
- Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf
- Code: https://github.com/WanyuGroup/CVPR2022-OrphicX
MLP
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
NAS
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
OCR
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
NeRF
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
-
Homepage: https://jonbarron.info/mipnerf360/
Point-NeRF: Point-based Neural Radiance Fields
- Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
- Paper: https://arxiv.org/abs/2201.08845
- Code: https://github.com/Xharlie/point-nerf
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
- Paper: https://arxiv.org/abs/2111.13679
- Homepage: https://bmild.github.io/rawnerf/
- Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc
Urban Radiance Fields
-
Homepage: https://urban-radiance-fields.github.io/
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
3D Face
ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations
长尾分布(Long-Tail)
Retrieval Augmented Classification for Long-Tail Visual Recognition
- Paper: https://arxiv.org/abs/2202.11233
- Code: None
Visual Transformer
Backbone
MPViT : Multi-Path Vision Transformer for Dense Prediction
MetaFormer is Actually What You Need for Vision
Mobile-Former: Bridging MobileNet and Transformer
- Paper: https://arxiv.org/abs/2108.05895
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
Shunted Self-Attention via Multi-Scale Token Aggregation
- Paper(Oral): https://arxiv.org/abs/2111.15193
- Code: https://github.com/OliverRensu/Shunted-Transformer
Learned Queries for Efficient Local Attention
- Paper(Oral): https://arxiv.org/abs/2112.11435
- Code: https://github.com/moabarar/qna
应用(Application)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
Embracing Single Stride 3D Object Detector with Sparse Transformer
- Paper: https://arxiv.org/abs/2112.06375
- Code: https://github.com/TuSimple/SST
- 中文解读:https://zhuanlan.zhihu.com/p/476056546
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
- Homepage: https://point-bert.ivg-research.xyz/
- Paper: https://arxiv.org/abs/2111.14819
- Code: https://github.com/lulutang0608/Point-BERT
GroupViT: Semantic Segmentation Emerges from Text Supervision
-
Homepage: https://jerryxu.net/GroupViT/
Restormer: Efficient Transformer for High-Resolution Image Restoration
Splicing ViT Features for Semantic Appearance Transfer
- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice
Self-supervised Video Transformer
-
Homepage: https://kahnchana.github.io/svt/
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Accelerating DETR Convergence via Semantic-Aligned Matching
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
- Paper: https://arxiv.org/abs/2203.01305
- Code: https://github.com/FengLi-ust/DN-DETR
- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
Style Transformer for Image Inversion and Editing
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Mask Transfiner for High-Quality Instance Segmentation
Language as Queries for Referring Video Object Segmentation
- Paper: https://arxiv.org/abs/2201.00487
- Code: https://github.com/wjn922/ReferFormer
- 中文解读:https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
AdaMixer: A Fast-Converging Query-Based Object Detector
- Paper(Oral): https://arxiv.org/abs/2203.16507
- Code: https://github.com/MCG-NJU/AdaMixer
Omni-DETR: Omni-Supervised Object Detection with Transformers
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Code: https://github.com/SvipRepetitionCounting/TransRAC
Collaborative Transformers for Grounded Situation Recognition
NFormer: Robust Person Re-identification with Neighbor Transformer
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
- Paper: https://arxiv.org/abs/2201.06889
- Code: None
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
- Paper(Oral): https://arxiv.org/abs/2204.08680
- Code: https://github.com/zengwang430521/TCFormer
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX
Safe Self-Refinement for Transformer-based Domain Adaptation
Fast Point Transformer
- Homepage: http://cvlab.postech.ac.kr/research/FPT/
- Paper: https://arxiv.org/abs/2112.04702
- Code: https://github.com/POSTECH-CVLab/FastPointTransformer
Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Stratified Transformer for 3D Point Cloud Segmentation
- Paper: https://arxiv.org/pdf/2203.14508.pdf
- Code: https://github.com/dvlab-research/Stratified-Transformer
视觉和语言(Vision-Language)
Conditional Prompt Learning for Vision-Language Models
Bridging Video-text Retrieval with Multiple Choice Question
Visual Abductive Reasoning
自监督学习(Self-supervised Learning)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
- Paper: https://arxiv.org/abs/2203.06965
- Code: None
Crafting Better Contrastive Views for Siamese Representation Learning
- Paper: https://arxiv.org/abs/2202.03278
- Code: https://github.com/xyupeng/ContrastiveCrop
- 中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A
HCSC: Hierarchical Contrastive Selective Coding
- Homepage: https://github.com/gyfastas/HCSC
- Paper: https://arxiv.org/abs/2202.00455
- 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
数据增强(Data Augmentation)
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
AlignMixup: Improving Representations By Interpolating Aligned Features
知识蒸馏(Knowledge Distillation)
Decoupled Knowledge Distillation
- Paper: https://arxiv.org/abs/2203.08679
- Code: https://github.com/megvii-research/mdistiller
- 中文解读:https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw
目标检测(Object Detection)
BoxeR: Box-Attention for 2D and 3D Transformers
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
- Paper: https://arxiv.org/abs/2203.01305
- Code: https://github.com/FengLi-ust/DN-DETR
- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
Accelerating DETR Convergence via Semantic-Aligned Matching
Localization Distillation for Dense Object Detection
- Paper: https://arxiv.org/abs/2102.12252
- Code: https://github.com/HikariTJU/LD
- Code2: https://github.com/HikariTJU/LD
- 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg
Focal and Global Knowledge Distillation for Detectors
- Paper: https://arxiv.org/abs/2111.11837
- Code: https://github.com/yzd-v/FGD
- 中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ
A Dual Weighting Label Assignment Scheme for Object Detection
AdaMixer: A Fast-Converging Query-Based Object Detector
- Paper(Oral): https://arxiv.org/abs/2203.16507
- Code: https://github.com/MCG-NJU/AdaMixer
Omni-DETR: Omni-Supervised Object Detection with Transformers
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection
- Paper(Oral): https://arxiv.org/abs/2203.06398
- Code: https://github.com/CityU-AIM-Group/SIGMA
半监督目标检测
Dense Learning based Semi-Supervised Object Detection
目标跟踪(Visual Tracking)
Correlation-Aware Deep Tracking
- Paper: https://arxiv.org/abs/2203.01666
- Code: None
TCTrack: Temporal Contexts for Aerial Tracking
多模态目标跟踪
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
多目标跟踪(Multi-Object Tracking)
Learning of Global Objective for Network Flow in Multi-Object Tracking
- Paper: https://arxiv.org/abs/2203.16210
- Code: None
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
- Homepage: https://dancetrack.github.io
- Paper: https://arxiv.org/abs/2111.14690
- Dataset: https://github.com/DanceTrack/DanceTrack
语义分割(Semantic Segmentation)
Novel Class Discovery in Semantic Segmentation
- Homepage: https://ncdss.github.io/
- Paper: https://arxiv.org/abs/2112.01900
- Code: https://github.com/HeliosZhao/NCDSS
Deep Hierarchical Semantic Segmentation
Rethinking Semantic Segmentation: A Prototype View
- Paper(Oral): https://arxiv.org/abs/2203.15102
- Code: https://github.com/tfzhou/ProtoSeg
弱监督语义分割
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation
- Homeapage: http://cvlab.postech.ac.kr/research/FIFO/
- Paper(Oral): https://arxiv.org/abs/2204.01587
- Code: https://github.com/sohyun-l/FIFO
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation
半监督语义分割
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2106.05095
- Code: https://github.com/LiheYoung/ST-PlusPlus
- 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
- Homepage: https://haochen-wang409.github.io/U2PL/
- Paper: https://arxiv.org/abs/2203.03884
- Code: https://github.com/Haochen-Wang409/U2PL
- 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ
Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
域自适应语义分割
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
无监督语义分割
GroupViT: Semantic Segmentation Emerges from Text Supervision
- Homepage: https://jerryxu.net/GroupViT/
- Paper: https://arxiv.org/abs/2202.11094
- Demo: https://youtu.be/DtJsWIUTW-Y
少样本语义分割
Generalized Few-shot Semantic Segmentation
实例分割(Instance Segmentation)
BoxeR: Box-Attention for 2D and 3D Transformers
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
Mask Transfiner for High-Quality Instance Segmentation
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
自监督实例分割
FreeSOLO: Learning to Segment Objects without Annotations
视频实例分割
Efficient Video Instance Segmentation via Tracklet Query and Proposal
- Homepage: https://jialianwu.com/projects/EfficientVIS.html
- Paper: https://arxiv.org/abs/2203.01853
- Demo: https://youtu.be/sSPMzgtMKCE
Temporally Efficient Vision Transformer for Video Instance Segmentation
全景分割(Panoptic Segmentation)
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
Large-scale Video Panoptic Segmentation in the Wild: A Benchmark
- Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
- Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
- Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
小样本分类(Few-Shot Classification)
Integrative Few-Shot Learning for Classification and Segmentation
Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification
小样本分割(Few-Shot Segmentation)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
Integrative Few-Shot Learning for Classification and Segmentation
Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
- Paper: https://arxiv.org/abs/2204.10638
- Code: None
图像抠图(Image Matting)
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
- Paper: https://arxiv.org/abs/2201.06889
- Code: None
视频理解(Video Understanding)
Self-supervised Video Transformer
- Homepage: https://kahnchana.github.io/svt/
- Paper: https://arxiv.org/abs/2112.01514
- Code: https://github.com/kahnchana/svt
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Code: https://github.com/SvipRepetitionCounting/TransRAC
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
-
Paper(Oral): https://arxiv.org/abs/2204.03646
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
- Paper(Oral): https://arxiv.org/abs/2204.02148
- Code: None
行为识别(Action Recognition)
Spatio-temporal Relation Modeling for Few-shot Action Recognition
动作检测(Action Detection)
End-to-End Semi-Supervised Learning for Video Action Detection
- Paper: https://arxiv.org/abs/2203.04251
- Code: None
图像编辑(Image Editing)
Style Transformer for Image Inversion and Editing
Blended Diffusion for Text-driven Editing of Natural Images
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
-
Homepage: https://semanticstylegan.github.io/
Low-level Vision
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
Restormer: Efficient Transformer for High-Resolution Image Restoration
Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements
- Paper(Oral): https://arxiv.org/abs/2111.12855
- Code: https://github.com/edongdongchen/REI
超分辨率(Super-Resolution)
图像超分辨率(Image Super-Resolution)
Learning the Degradation Distribution for Blind Image Super-Resolution
视频超分辨率(Video Super-Resolution)
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
- Paper: https://arxiv.org/abs/2104.13371
- Code: https://github.com/open-mmlab/mmediting
- Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
- 中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g
Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling
- Paper: https://arxiv.org/abs/2204.07114
- Code: None
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX
去模糊(Deblur)
图像去模糊(Image Deblur)
Learning to Deblur using Light Field Generated and Real Defocus Images
-
Homepage: http://lyruan.com/Projects/DRBNet/
-
Paper(Oral): https://arxiv.org/abs/2204.00442
3D点云(3D Point Cloud)
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
-
Homepage: https://point-bert.ivg-research.xyz/
A Unified Query-based Paradigm for Point Cloud Understanding
- Paper: https://arxiv.org/abs/2203.01252
- Code: None
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
PointCLIP: Point Cloud Understanding by CLIP
Fast Point Transformer
- Homepage: http://cvlab.postech.ac.kr/research/FPT/
- Paper: https://arxiv.org/abs/2112.04702
- Code: https://github.com/POSTECH-CVLab/FastPointTransformer
RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds
The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution
3D目标检测(3D Object Detection)
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds
-
Paper(Oral): https://arxiv.org/abs/2203.11139
BoxeR: Box-Attention for 2D and 3D Transformers
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
Embracing Single Stride 3D Object Detector with Sparse Transformer
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
HyperDet3D: Learning a Scene-conditioned 3D Object Detector
- Paper: https://arxiv.org/abs/2204.05599
- Code: None
OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
- Homepage: https://thudair.baai.ac.cn/index
- Paper: https://arxiv.org/abs/2204.05575
- Code: https://github.com/AIR-THU/DAIR-V2X
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions
-
Homepage: https://ithaca365.mae.cornell.edu/
3D语义分割(3D Semantic Segmentation)
Scribble-Supervised LiDAR Semantic Segmentation
Stratified Transformer for 3D Point Cloud Segmentation
- Paper: https://arxiv.org/pdf/2203.14508.pdf
- Code: https://github.com/dvlab-research/Stratified-Transformer
3D实例分割(3D Instance Segmentation)
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions
-
Homepage: https://ithaca365.mae.cornell.edu/
3D目标跟踪(3D Object Tracking)
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
PTTR: Relational 3D Point Cloud Object Tracking with Transformer
3D人体姿态估计(3D Human Pose Estimation)
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
- Paper: https://arxiv.org/abs/2203.07697
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw
BEV: Putting People in their Place: Monocular Regression of 3D People in Depth
- Homepage: https://arthur151.github.io/BEV/BEV.html
- Paper: https://arxiv.org/abs/2112.08274
- Code: https://github.com/Arthur151/ROMP
- Dataset: https://github.com/Arthur151/Relative_Human
- Demo: https://www.youtube.com/watch?v=Q62fj_6AxRI
3D语义场景补全(3D Semantic Scene Completion)
MonoScene: Monocular 3D Semantic Scene Completion
3D重建(3D Reconstruction)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
- Homepage: https://banmo-www.github.io/
- Paper: https://arxiv.org/abs/2112.12761
- Code: https://github.com/facebookresearch/banmo
- 中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew
行人重识别(Person Re-identification)
NFormer: Robust Person Re-identification with Neighbor Transformer
伪装物体检测(Camouflaged Object Detection)
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
深度估计(Depth Estimation)
单目深度估计
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
- Paper: https://arxiv.org/abs/2203.01502
- Code: None
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
- Paper: https://arxiv.org/abs/2203.00838
- Code: None
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
Multi-Frame Self-Supervised Depth with Transformers
-
Code: None
立体匹配(Stereo Matching)
ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
特征匹配(Feature Matching)
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching
- Paper: https://arxiv.org/abs/2204.11700
- Code: None
车道线检测(Lane Detection)
Rethinking Efficient Lane Detection via Curve Modeling
- Paper: https://arxiv.org/abs/2203.02431
- Code: https://github.com/voldemortX/pytorch-auto-drive
- Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4
A Keypoint-based Global Association Network for Lane Detection
光流估计(Optical Flow Estimation)
Imposing Consistency for Optical Flow Estimation
- Paper: https://arxiv.org/abs/2204.07262
- Code: None
Deep Equilibrium Optical Flow Estimation
GMFlow: Learning Optical Flow via Global Matching
- Paper(Oral): https://arxiv.org/abs/2111.13680
- Code: https://github.com/haofeixu/gmflow
图像修复(Image Inpainting)
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
图像检索(Image Retrieval)
Correlation Verification for Image Retrieval
- Paper(Oral): https://arxiv.org/abs/2204.01458
- Code: https://github.com/sungonce/CVNet
人脸识别(Face Recognition)
AdaFace: Quality Adaptive Margin for Face Recognition
- Paper(Oral): https://arxiv.org/abs/2204.00964
- Code: https://github.com/mk-minchul/AdaFace
人群计数(Crowd Counting)
Leveraging Self-Supervision for Cross-Domain Crowd Counting
- Paper: https://arxiv.org/abs/2103.16291
- Code: None
医学图像(Medical Image)
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
- Paper: https://arxiv.org/abs/2203.02533
- Code: None
Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
视频生成(Video Generation)
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
-
Homepage: https://universome.github.io/stylegan-v
-
Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4
场景图生成(Scene Graph Generation)
SGTR: End-to-end Scene Graph Generation with Transformer
- Paper: https://arxiv.org/abs/2112.12970
- Code: None
参考视频目标分割(Referring Video Object Segmentation)
Language as Queries for Referring Video Object Segmentation
ReSTR: Convolution-free Referring Image Segmentation Using Transformers
- Paper: https://arxiv.org/abs/2203.16768
- Code: None
步态识别(Gait Recognition)
Gait Recognition in the Wild with Dense 3D Representations and A Benchmark
- Homepage: https://gait3d.github.io/
- Paper: https://arxiv.org/abs/2204.02569
- Code: https://github.com/Gait3D/Gait3D-Benchmark
风格迁移(Style Transfer)
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
-
Homepage: https://lukashoel.github.io/stylemesh/
异常检测(Anomaly Detection)
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
- Paper(Oral): https://arxiv.org/abs/2111.09099
- Code: https://github.com/ristea/sspcab
对抗样本(Adversarial Examples)
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon
LAS-AT: Adversarial Training with Learnable Attack Strategy
- Paper(Oral): https://arxiv.org/abs/2203.06616
- Code: https://github.com/jiaxiaojunQAQ/LAS-AT
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection
弱监督物体检测(Weakly Supervised Object Localization)
Weakly Supervised Object Localization as Domain Adaption
雷达目标检测(Radar Object Detection)
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
- Paper: https://arxiv.org/abs/2204.01184
- Code: None
高光谱图像重建(Hyperspectral Image Reconstruction)
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
图像拼接(Image Stitching)
Deep Rectangling for Image Stitching: A Learning Baseline
-
Paper(Oral): https://arxiv.org/abs/2203.03831
水印(Watermarking)
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
- Paper: https://arxiv.org/abs/2104.13450
- Code: None
Action Counting
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
- Code: https://github.com/SvipRepetitionCounting/TransRAC
Grounded Situation Recognition
Collaborative Transformers for Grounded Situation Recognition
Zero-shot Learning
Unseen Classes at a Later Time? No Problem
- Paper: https://arxiv.org/abs/2203.16517
- Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time
DeepFakes
Detecting Deepfakes with Self-Blended Images
-
Paper(Oral): https://arxiv.org/abs/2204.08376
数据集(Datasets)
It's About Time: Analog Clock Reading in the Wild
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
Kubric: A scalable dataset generator
- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
- 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
Scribble-Supervised LiDAR Semantic Segmentation
Deep Rectangling for Image Stitching: A Learning Baseline
- Paper(Oral): https://arxiv.org/abs/2203.03831
- Code: https://github.com/nie-lang/DeepRectangling
- Dataset: https://github.com/nie-lang/DeepRectangling
- 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
- Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/
- Paper: https://arxiv.org/abs/2204.02389
- Dataset: https://github.com/rhgao/ObjectFolder
- Demo:https://youtu.be/e5aToT3LkRA
Shape from Polarization for Complex Scenes in the Wild
- Homepage: https://chenyanglei.github.io/sfpwild/index.html
- Paper: https://arxiv.org/abs/2112.11377
- Code: https://github.com/ChenyangLEI/sfp-wild
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
- Code: https://github.com/SvipRepetitionCounting/TransRAC
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
- Paper(Oral): https://arxiv.org/abs/2204.03646
- Dataset: https://github.com/xujinglin/FineDiving
- Code: https://github.com/xujinglin/FineDiving
- 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
- Paper: https://arxiv.org/abs/2204.02701
- Dataset: https://github.com/yizhiwang96/TextLogoLayout
- Code: https://github.com/yizhiwang96/TextLogoLayout
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
- Homepage: https://thudair.baai.ac.cn/index
- Paper: https://arxiv.org/abs/2204.05575
- Code: https://github.com/AIR-THU/DAIR-V2X
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
- Paper: https://arxiv.org/abs/2204.10039
- Code: https://github.com/H-deep/Trans-SVSR/
- Dataset: http://shorturl.at/mpwGX
Putting People in their Place: Monocular Regression of 3D People in Depth
-
Homepage: https://arthur151.github.io/BEV/BEV.html
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
- Homepage: https://dancetrack.github.io
- Paper: https://arxiv.org/abs/2111.14690
- Dataset: https://github.com/DanceTrack/DanceTrack
Visual Abductive Reasoning
Large-scale Video Panoptic Segmentation in the Wild: A Benchmark
- Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
- Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
- Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions
-
Homepage: https://ithaca365.mae.cornell.edu/
新任务(New Task)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
It's About Time: Analog Clock Reading in the Wild
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
Splicing ViT Features for Semantic Appearance Transfer
- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice
Visual Abductive Reasoning
其他(Others)
Kubric: A scalable dataset generator
- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
- 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Balanced MSE for Imbalanced Visual Regression
- Paper(Oral): https://arxiv.org/abs/2203.16427
- Code: https://github.com/jiawei-ren/BalancedMSE
SNUG: Self-Supervised Neural Dynamic Garments
- Homepage: http://mslab.es/projects/SNUG/
- Paper(Oral): https://arxiv.org/abs/2204.02219
- Code: https://github.com/isantesteban/snug
Shape from Polarization for Complex Scenes in the Wild
- Homepage: https://chenyanglei.github.io/sfpwild/index.html
- Paper: https://arxiv.org/abs/2112.11377
- Code: https://github.com/ChenyangLEI/sfp-wild
LASER: LAtent SpacE Rendering for 2D Visual Localization
- Paper(Oral): https://arxiv.org/abs/2204.00157
- Code: None
Single-Photon Structured Light
- Paper(Oral): https://arxiv.org/abs/2204.05300
- Code: None
3DeformRS: Certifying Spatial Deformations on Point Clouds
- Paper: https://arxiv.org/abs/2204.05687
- Code: None
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
- Paper: https://arxiv.org/abs/2204.02701
- Dataset: https://github.com/yizhiwang96/TextLogoLayout
- Code: https://github.com/yizhiwang96/TextLogoLayout
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Robust and Accurate Superquadric Recovery: a Probabilistic Approach
- Paper(Oral): https://arxiv.org/abs/2111.14517
- Code: https://github.com/bmlklwx/EMS-superquadric_fitting
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
- Paper: https://arxiv.org/abs/2203.00911
- Code: None
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
- Paper(Oral): https://arxiv.org/abs/2204.08680
- Code: https://github.com/zengwang430521/TCFormer
DeepDPM: Deep Clustering With an Unknown Number of Clusters
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Proto2Proto: Can you recognize the car, the way I do?
Putting People in their Place: Monocular Regression of 3D People in Depth
- Homepage: https://arthur151.github.io/BEV/BEV.html
- Paper: https://arxiv.org/abs/2112.08274
- Code:https://github.com/Arthur151/ROMP
- Dataset: https://github.com/Arthur151/Relative_Human
Light Field Neural Rendering
- Homepage: https://light-field-neural-rendering.github.io/
- Paper(Oral): https://arxiv.org/abs/2112.09687
- Code: https://github.com/google-research/google-research/tree/master/light_field_neural_rendering
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
- Paper: https://arxiv.org/abs/2204.06160
- Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution
Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning