JAYCHOU2020/CVPR-2023-Papers

CVPR-2023-Papers

官网链接：

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2023 年论文分类汇总戳这里

↘️CVPR-2023-Papers ↘️WACV-2023-Papers

2022 年论文分类汇总戳这里

↘️CVPR-2022-Papers ↘️WACV-2022-Papers ↘️ECCV-2022-Papers

2021年论文分类汇总戳这里

↘️ICCV-2021-Papers ↘️CVPR-2021-Papers

2020 年论文分类汇总戳这里

↘️CVPR-2020-Papers ↘️ECCV-2020-Papers

目录

🐱	🐶	🐯	🐺
1.其它	2.Image Segmentation(图像分割)	3.Image Progress(图像处理)	4.Image Captioning(图像字幕)
5.Object Detection(目标检测)	6.Object Tracking(目标跟踪)	7.Point Cloud(点云)	8.Action Detection(人体动作检测与识别)
9.Human Pose Estimation(人体姿态估计)	10.3D(三维视觉)	11.Face	12.Image-to-Image Translation(图像到图像翻译)
13.GAN	14.Video	15.Transformer	16.Semi/self-supervised learning(半/自监督)
17.Medical Image(医学影像)	18.Person Re-Identification(人员重识别)	19.Neural Architecture Search(神经架构搜索)	20.Autonomous vehicles(自动驾驶)
21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)	22.Image Synthesis/Generation(图像合成)	23.Image Retrieval(图像检索)	24.Super-Resolution(超分辨率)
25.Fine-Grained/Image Classification(细粒度/图像分类)	26.GCN/GNN	27.Pose Estimation(物体姿势估计)	28.Style Transfer(风格迁移)
29.Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)	30.Visual Answer Questions(视觉问答)	31.Vision-Language(视觉语言)	32.Data Augmentation(数据增强)
33.Human-Object Interaction(人物交互)	34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)	35.OCR	36.Optical Flow(光流估计)
37.Contrastive Learning(对比学习)	38.Meta-Learning(元学习)	39.Continual Learning(持续学习)	40.Adversarial Learning(对抗学习)
41.Incremental Learning(增量学习)	42.Metric Learning(度量学习)	43.Multi-Task Learning(多任务学习)	44.Federated Learning(联邦学习)
45.Dense Prediction(密集预测)	46.Scene Graph Generation(场景图生成)	47.Few/Zero-Shot Learning/DG/Adaptation(小/零样本/域泛化/适应)	48.Visual Grounding
49.Image Geo-localization(图像地理定位)	50.Anomaly Detection(异常检测)	51.光学、几何、光场成像	52.Human Motion Forecasting(人体运动预测)
53.Sign Language Translation(手语翻译)	54.Dataset(数据集)	55.Novel View Synthesis(视图合成)	56.Sound
57.Gaze Estimation(视线估计)	58.Neural rendering(神经渲染)	59.动画	60.Visual Emotion Analysis(视觉情感分析)

3月31日更新 30 篇

Image Forgery Detection

Hierarchical Fine-Grained Image Forgery Detection and Localization
⭐code

Active Learning(主动学习)

聚类

MVC
- On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering
  ⭐code

Scene flow estimation(场景流估计)

Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
⭐code

Motion Retargeting

Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry
⭐code

edge detection

edge detection
- The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
  ⭐code

工业缺陷检测

缺陷定位
- PyramidFlow: High-Resolution Defect Contrastive Localization using Pyramid Normalizing Flow
工业异常检测
- Multimodal Industrial Anomaly Detection via Hybrid Fusion
  ⭐code

.Image Compression

58.Neural rendering(神经渲染)

57.Gaze Estimation(视线估计)

NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation

56.Sound

扬声器检测
- A Light Weight Model for Active Speaker Detection
  ⭐code
视听语音识别
视听定位
- Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
  ⭐code
- Audio-Visual Grouping Network for Sound Localization from Mixtures
  ⭐code
音频源分离
- Language-Guided Audio-Visual Source Separation via Trimodal Consistency
声音合成
- Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
  ⭐code
电影音频描述
- AutoAD: Movie Description in Context
  🏠project
从声音中生成场景图像
- Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment

55.Novel View Synthesis(视图合成)

54.Dataset(数据集)

53.Sign Language Translation(手语翻译)

手语识别
- Continuous Sign Language Recognition with Correlation Network
  ⭐code
- Natural Language-Assisted Sign Language Recognition
  ⭐code
手语检索
- CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
  ⭐code

52.Human Motion Forecasting(人体运动预测)

EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
⭐code

51.光学、几何、光场成像

50.Anomaly Detection(异常检测)

49.Image Geo-localization(图像地理定位)

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

48.Visual Grounding

Text-Visual Prompting for Efficient 2D Temporal Video Grounding

47.Few/Zero-Shot Learning/Domain Generalization/Adaptation(小/零样本/域泛化/适应)

46.Scene Graph Generation(场景图生成)

Prototype-based Embedding Network for Scene Graph Generation
⭐code

45.Dense Prediction(密集预测)

44.Federated Learning(联邦学习)

43.Multi-Task Learning(多任务学习)

42.Metric Learning(度量学习)

Advancing Deep Metric Learning Through Multiple Batch Norms And Multi-Targeted Adversarial Examples

41.Incremental Learning(增量学习)

类增量学习
- Dense Network Expansion for Class Incremental Learning
- Class-Incremental Exemplar Compression for Class-Incremental Learning
  ⭐code

40.Adversarial Learning(对抗学习)

39.Continual Learning(持续学习)

38.Meta-Learning(元学习)

37.Contrastive Learning(对比学习)

36.Optical Flow(光流估计)

35.OCR

场景文本检测
- Turning a CLIP Model into a Scene Text Detector
  ⭐code
表格结构识别
- Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling
字体生成
- CF-Font: Content Fusion for Few-shot Font Generation
  ⭐code
手写文本生成
- Disentangling Writer and Character Styles for Handwriting Generation
  ⭐code
- Handwritten Text Generation from Visual Archetypes
矢量字体合成
- DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
  ⭐code

34.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

33.Human-Object Interaction(人物交互)

32.Data Augmentation(数据增强)

31.Vision-Language(视觉语言)

30.Visual Answer Questions(视觉问答)

VQA

29.SLAM/Augmented Reality/Virtual Reality/Robotics(增强/虚拟现实/机器人)

机器人
- PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
- Learning Human-to-Robot Handovers from Point Clouds
  ⭐code
- 机器手抓取
  - UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
    🏠project
- Visual Navigation(视觉导航)
  - Renderable Neural Radiance Map for Visual Navigation
SLAM
- Efficient Map Sparsification Based on 2D and 3D Discretized Grids
  ⭐code
虚拟试穿
- GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning
  ⭐code
AR/VR
- Affordance Grounding from Demonstration Video to Target Image
  ⭐code
- Learning to Zoom and Unzoom
  ⭐code

28.Style Transfer(风格迁移)

27.Pose Estimation(物体姿势估计)

物体计数
- Zero-shot Object Counting
  ⭐code
物体重识别
- MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
  ⭐code
- Large-scale Training Data Search for Object Re-identification
  ⭐code
物体姿势估计
6D
- Rigidity-Aware Detection for 6D Object Pose Estimation
动物姿态估计
- ScarceNet: Animal Pose Estimation with Scarce Annotations

26.GCN/GNN

25.Fine-Grained/Image Classification(细粒度/图像分类)

24.Super-Resolution(超分辨率)

23.Image Retrieval(图像检索)

基于草图的图像检索
文本-视频检索
- Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
  ⭐code

22.Image Synthesis/Generation(图像合成)

Freestyle Layout-to-Image Synthesis
⭐code
基于草图生成
- Picture that Sketch: Photorealistic Image Generation from Abstract Sketches
  🏠project
图像-视频合成
- Conditional Image-to-Video Generation with Latent Flow Diffusion Models
  ⭐code
海报生成
- Unsupervised Domain Adaption with Pixel-level Discriminator for Image-aware Layout Generation
文本-图像合成
- Variational Distribution Learning for Unsupervised Text-to-Image Generation
prompting
- Diversity-Aware Meta Visual Prompting
  ⭐code
生成

21.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)

图像检测
- Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images
  ⭐code

20.Autonomous vehicles(自动驾驶)

19.Neural Architecture Search(神经架构搜索)

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
⭐code

18.Person Re-Identification(人员重识别)

人员检索
- Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
  ⭐code
可见光-红外人员重识别（VIReID）
- Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
  ⭐code

17.Medical Image(医学影像)

16.Semi/self-supervised learning(半/自监督)

15.Transformer

14.Video

Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
⭐code
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates
🏠project
🏠project
视频时刻检索
视频高亮检测
- Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
  ⭐code
视频帧插值
- Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
  ⭐code
- Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
  ⭐code
视频合成
- Decomposed Diffusion Models for High-Quality Video Generation
视频预测
- MOSO: Decomposing MOtion, Scene and Object for Video Prediction
  ⭐code
- A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
  ⭐code
视频理解
视频描述
- Fine-grained Audible Video Description
视频摘要
- Align and Attend: Multimodal Summarization with Dual Contrastive Losses
  ⭐code
视频识别
- Frame Flexible Network
  ⭐code
Video Deflickering(去闪烁)
- Blind Video Deflickering by Neural Filtering with a Flawed Atlas
  ⭐code
时间句子定位(TSG)
- You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos
VAD
- Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection

13.GAN

12.Image-to-Image Translation(图像到图像翻译)

11.Face(人脸)

10.3D(三维重建\视觉)

9.Human Pose Estimation(人体姿态估计)

手势
- 音频驱动的联合语音手势生成
  - Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
    ⭐code
- 3D手势合成
  - Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement
- 手部重建
  - ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
    ⭐code
- 3D手部恢复
  - Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild
    ⭐code
  - Recovering 3D Hand Mesh Sequence from a Single Blurry Image: A New Dataset and Temporal Unfolding
    ⭐code
人体
多人姿态预测
- Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
  ⭐code

8.Action Detection(人体动作检测与识别)

7.Point Cloud(点云)

6.Object Tracking(目标跟踪)

Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
⭐code
Joint Visual Grounding and Tracking with Natural Language Specification
⭐code
Generalized Relation Modeling for Transformer Tracking
⭐code
多目标跟踪
- Referring Multi-Object Tracking
  ⭐code
- MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking
细胞跟踪
- Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses
  ⭐code
多模态跟踪
- Visual Prompt Multi-Modal Tracking
  ⭐code

5.Object Detection(目标检测)

4.Image Captioning(图像字幕)

视频字幕

3.Image Progress(图像处理)

2.Image Segmentation(图像分割)

Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervisio
MP-Former: Mask-Piloted Transformer for Image Segmentation
⭐code
Explicit Visual Prompting for Low-Level Structure Segmentations
⭐code
Focused and Collaborative Feedback Integration for Interactive Image Segmentation
⭐code
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
3D分割
- EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
  🏠project
全景分割
- You Only Segment Once: Towards Real-Time Panoptic Segmentation
  ⭐code
实例分割
语义分割
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model
  ⭐code
- Delivering Arbitrary-Modal Semantic Segmentation
  ⭐code
- Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
- Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
- Generative Semantic Segmentation
  ⭐code
- Reliability in Semantic Segmentation: Are We on the Right Track?
  ⭐code
- Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
- Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
  ⭐code
- Instant Domain Augmentation for LiDAR Semantic Segmentation
  🏠[project](http://cvlab.postech.ac.kr/research/LiDomAug
- Leveraging Hidden Positives for Unsupervised Semantic Segmentation
  ⭐code
- 半监督语义分割
  - Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation
    ⭐code
- 弱监督语义分割
  - Token Contrast for Weakly-Supervised Semantic Segmentation
    ⭐code
- 点云语义分割
  - Novel Class Discovery for 3D Point Cloud Semantic Segmentation
    ⭐code
- 3D 语义分割
  - Seg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving](https://arxiv.org/abs/2303.08600)
    ⭐code
交互式分割
- Interactive Segmentation as Gaussian Process Classification
  ⭐code
小样本分割
- Hierarchical Dense Correlation Distillation for Few-Shot Segmentation
  ⭐code
VSS
- Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
  ⭐code
- Spatio-Temporal Pixel-Level Contrastive Learning-based Source-Free Domain Adaptation for Video Semantic Segmentation
  ⭐code
VOS
VIS
- Mask-Free Video Instance Segmentation
  ⭐code
  🏠project
  ⭐code

1.其它

扫码CV君微信（注明：CVPR）入微信交流群：