官网链接:
历年综述论文分类汇总戳这里↘️ CV-Surveys施工中~~~~~~~~~~
- SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
🏠project - NeRF-Supervised Deep Stereo
⭐code
⭐code - Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
⭐code - Consistent View Synthesis with Pose-Guided Diffusion Models
⭐code - 3D Line Mapping Revisited
⭐code - Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
- Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
- PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
⭐code
⭐code - Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
⭐code - PMatch: Paired Masked Image Modeling for Dense Geometric Matching
⭐code - Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving
- Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
- Streaming Video Model
⭐code - FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
- Few-shot Geometry-Aware Keypoint Localization
⭐code - SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
- LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
⭐code - Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates
🏠project
🏠project - Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation
⭐code - KD-DLGAN: Data Limited Image Generation via Knowledge Distillation
- Mixed Autoencoder for Self-supervised Visual Representation Learning
- C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
- Masked and Adaptive Transformer for Exemplar Based Image Translation
⭐code - Hierarchical Fine-Grained Image Forgery Detection and Localization
⭐code - ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
⭐code - Enhanced Stable View Synthesis
- DiffCollage: Parallel Generation of Large Content with Diffusion Models
🏠project - Audio-Visual Grouping Network for Sound Localization from Mixtures
⭐code - PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
- Learning Human-to-Robot Handovers from Point Clouds
⭐code
- Re-thinking Federated Active Learning based on Inter-class Diversity
- Box-Level Active Detection
⭐code
- 缺陷定位
- 工业异常检测
- Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
- Context-Based Trit-Plane Coding for Progressive Image Compression
⭐code - Learned Image Compression with Mixed Transformer-CNN Architectures
⭐code - 视频压缩
- NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
🏠project - FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization
🏠project - StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
🏠project - Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention
⭐code - WildLight: In-the-wild Inverse Rendering with a Flashlight
⭐code - Grid-guided Neural Radiance Fields for Large Urban Scenes
⭐code - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images
- HandNeRF: Neural Radiance Fields for Animatable Interacting Hands
- ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field
- NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
⭐code - JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
🏠project - FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
⭐code - NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
- 扬声器检测
- 视听语音识别
- 视听定位
- 音频源分离
- 声音合成
- 电影音频描述
- 从声音中生成场景图像
- Frequency-Modulated Point Cloud Rendering with Easy Editing
⭐code - Balanced Spherical Grid for Egocentric View Synthesis
- Progressively Optimized Local Radiance Fields for Robust View Synthesis
⭐code - F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories
⭐code - Enhanced Stable View Synthesis
- Consistent View Synthesis with Pose-Guided Diffusion Models
⭐code
- xFBD: Focused Building Damage Dataset and Analysis
建筑物损坏数据集 - Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo
🌻dataset - Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
- HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
🌻dataset - CUDA: Convolution-based Unlearnable Datasets
🌻dataset - MVImgNet: A Large-scale Dataset of Multi-view Images
🌻dataset - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
🌻dataset
Vehicle-to-Vehicle(V2V)感知 - Polynomial Implicit Neural Representations For Large Diverse Datasets
🌻dataset - MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset
🌻dataset - RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset
🌻dataset - Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
- Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts
⭐code - ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data
⭐code - CelebV-Text: A Large-Scale Facial Text-Video Dataset
⭐code
人脸文本到视频生成 - Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
⭐code
艺术图像美学评估 - GeoNet: Benchmarking Unsupervised Adaptation across Geographies
⭐code - PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout
⭐code
- 手语识别
- 手语检索
- DyLiN: Making Light Field Networks Dynamic
⭐code - Learning Rotation-Equivariant Features for Visual Correspondence
🏠project
- Diversity-Measurable Anomaly Detection
- SimpleNet: A Simple Network for Image Anomaly Detection and Localization
⭐code - WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
- OOD
- DG
- Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View
- Modality-Agnostic Debiasing for Single Domain Generalization
- Neuron Structure Modeling for Generalizable Remote Physiological Measurement
⭐code - Sharpness-Aware Gradient Matching for Domain Generalization
⭐code - Improving Generalization with Domain Convex Game
- Generalist: Decoupling Natural and Robust Generalization
⭐code - ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization
⭐code
- DA
- Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation
⭐code - Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
- Upcycling Models under Domain and Category Shift
⭐code - C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
- A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
⭐code - TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
⭐code - Feature Alignment and Uniformity for Test Time Adaptation
- Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation
- ZSL
- DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
- Ensemble-based Blackbox Attacks on Dense Prediction
⭐code - 密集检测
- Make Landscape Flatter in Differentially Private Federated Learning
- The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
- 类增量学习
- Feature Separation and Recalibration for Adversarial Robustness
⭐code - CFA: Class-wise Calibrated Fair Adversarial Training
⭐code - 黑盒
- 对抗样本
- 后门攻击
- 对抗攻击
- Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
⭐code - Computationally Budgeted Continual Learning: What Does Matter?
⭐code - Preserving Linear Separability in Continual Learning by Backward Feature Projection
- Twin Contrastive Learning with Noisy Labels
⭐code - Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
- Rethinking Optical Flow from Geometric Matching Consistent Perspective
⭐code - DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling
- AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation
- 场景文本检测
- 表格结构识别
- 字体生成
- 手写文本生成
- 矢量字体合成
- Detecting Human-Object Contact in Images
🏠project - Category Query Learning for Human-Object Interaction Classification
- Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
- HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
⭐code - Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
🏠project - 双手交互
- 手物交互
- FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
⭐code - DeAR: Debiasing Vision-Language Models with Additive Residuals
- Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
- Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
⭐code - VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
- MAGVLT: Masked Generative Vision-and-Language Transformer
- Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- Top-Down Visual Attention from Analysis by Synthesis
🏠project - Accelerating Vision-Language Pretraining with Free Language Modeling
⭐code - VLN
- VQA
- SimVQA: Exploring Simulated Environments for Visual Question Answering
🏠project - MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
- Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering
⭐code - MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
- Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
⭐code
- SimVQA: Exploring Simulated Environments for Visual Question Answering
- 机器人
- SLAM
- 虚拟试穿
- AR/VR
- 物体计数
- 物体重识别
- 物体姿势估计
- 6D
- 动物姿态估计
- Equiangular Basis Vectors
⭐code - Boosting Verified Training for Robust Image Classifications via Abstraction
⭐code - Semantic Prompt for Few-Shot Image Recognition
- Regularization of polynomial networks for image recognition
⭐code - Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
⭐code - Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
⭐code - 小样本分类
- 细粒度
- 长尾分类
- ISR
- OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution
- Super-Resolution Neural Operator
⭐code - Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution
- Human Guided Ground-truth Generation for Realistic Image Super-resolution
⭐code - Implicit Diffusion Models for Continuous Super-Resolution
- VSR
- 文本图像超分辨率
- 基于草图的图像检索
- 文本-视频检索
- Freestyle Layout-to-Image Synthesis
⭐code - 基于草图生成
- 图像-视频合成
- 海报生成
- 文本-图像合成
- prompting
- 生成
- LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
🏠project - Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
🏠project - DiffCollage: Parallel Generation of Large Content with Diffusion Models
🏠project - LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
⭐code
- LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
- 图像检测
- 自动驾驶
- 轨迹预测
- IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
- Leapfrog Diffusion Model for Stochastic Trajectory Prediction
⭐code - Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction
⭐code - FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-tail Trajectory Prediction
- Place Recognition
- 人员检索
- 可见光-红外人员重识别(VIReID)
- Towards Trustable Skin Cancer Diagnosis via Rewriting Model’s Decision
- Hierarchical discriminative learning improves visual representations of biomedical microscopy
🏠project - Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
- 3D医学
- 图像配准
- 图像分类
- 报告生成
- 医学影像分割
- 医学影像分析
- 肿瘤分割
- 无监督学习
- 自监督
- Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
- Correlational Image Modeling for Self-Supervised Visual Pre-Training
- Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
⭐code - Mixed Autoencoder for Self-supervised Visual Representation Learning
- 半监督
- Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
🏠project - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
⭐code - Generic-to-Specific Distillation of Masked Autoencoders
⭐code - BiFormer: Vision Transformer with Bi-Level Routing Attention
⭐code - Making Vision Transformers Efficient from A Token Sparsification View
- Dual-path Adaptation from Image to Video Transformers
⭐code - Spherical Transformer for LiDAR-based 3D Recognition
⭐code - MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
⭐code - Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
- Learning Expressive Prompting With Residuals for Vision Transformers
- SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
🏠project
- Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
⭐code - VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
- Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates
🏠project
🏠project - 视频时刻检索
- 视频高亮检测
- 视频帧插值
- 视频合成
- 视频预测
- 视频理解
- 视频描述
- 视频摘要
- 视频识别
- Video Deflickering(去闪烁)
- 时间句子定位(TSG)
- VAD
- Improving GAN Training via Feature Space Shrinkage
⭐code - CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
- Graph Transformer GANs for Graph-Constrained House Generation
- Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models
⭐code - Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis
⭐code - VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs
⭐code - 图像-文本合成
- DSI2I: Dense Style for Unpaired Image-to-Image Translation
- Fix the Noise: Disentangling Source Feature for Controllable Domain Translation
⭐code - 3D-Aware Multi-Class Image-to-Image Translation with NeRFs
- 图像翻译
- Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition
- Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
👍CVPR 2023 | 人脸识别路漫漫:清华、北大等提出AT3D人脸识别系统攻击方法 - 3D 人脸
- 人脸重建
- 人脸恢复
- 人脸匿名化
- 裸眼年龄识别
- 情绪识别
- 人像照明
- 人脸活体检测
- 说话头
- 人脸分割
- 眨眼检测
- PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
🏠project - Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
⭐code - Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
⭐code - SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
⭐code - 3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
⭐code - 3D Concept Learning and Reasoning from Multi-View Images
🏠project - PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$
⭐code - Persistent Nature: A Generative Model of Unbounded 3D Worlds
🏠project - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
- Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization
- On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
⭐code - SUDS: Scalable Urban Dynamic Scenes
🏠project - Understanding and Improving Features Learned in Deep Functional Maps
⭐code - TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering
⭐code - Generalizable Local Feature Pre-training for Deformable Shape Analysis
⭐code - CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
🏠project - CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes
🏠project - HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
⭐code - Multi-View Azimuth Stereo via Tangent Space Consistency
⭐code - 3D Line Mapping Revisited
⭐code - NeRF-Supervised Deep Stereo
⭐code
⭐code - 三维重建
- BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
⭐code - PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
⭐code - Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
- MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices
🏠project - Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container
⭐code - SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates
⭐code - MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
🏠project - Scalable, Detailed and Mask-Free Universal Photometric Stereo
⭐code - NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
- Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
- NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images
🏠project
- BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
- 深度估计
- Fully Self-Supervised Depth Estimation from Defocus Clue
⭐code - HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions
🏠project - Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
⭐code
👍CVPR2023 | 轻量高效的自监督深度估计框架Lite-Mono
- Fully Self-Supervised Depth Estimation from Defocus Clue
- 室内场景重建
- 手势
- 音频驱动的联合语音手势生成
- 3D手势合成
- 手部重建
- 3D手部恢复
- 人体
- HPE
- DistilPose: Tokenized Pose Regression with Heatmap Distillation
- PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
⭐code - Human Pose as Compositional Tokens
⭐code - Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
- Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
- Human Pose Estimation in Extremely Low-Light Conditions
- 3D HPE
- 4D HPE
- 网格恢复
- 三维人体网格估计
- 3D人体重建
- HPE
- 多人姿态预测
- Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
- Learning Action Changes by Measuring Verb-Adverb Textual Relationships
⭐code - STMixer: A One-Stage Sparse Action Detector
- TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
⭐code - 基于骨架的动作识别
- 基于关键点的动作识别
- 时序动作识别
- 开集动作识别
- Neural Intrinsic Embedding for Non-rigid Point Cloud Matching
- SCPNet: Semantic Scene Completion on Point Cloud
- Rotation-Invariant Transformer for Point Cloud Matching
- Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
🏠project - VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
⭐code - Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
⭐code - Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
- Spatiotemporal Self-supervised Learning for Point Clouds in the Wild
⭐code - NerVE: Neural Volumetric Edges for Parametric Curve Extraction from Point Cloud
⭐code - 3D点云
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
⭐code - NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation
⭐code
⭐code - Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation
⭐code
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
- 点云实例分割
- 点云分类
- 点云补全
- 点云配准
- 点云理解
- Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
⭐code - Joint Visual Grounding and Tracking with Natural Language Specification
⭐code - Generalized Relation Modeling for Transformer Tracking
⭐code - 多目标跟踪
- 细胞跟踪
- 多模态跟踪
- Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR
⭐code - ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
⭐code - Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
⭐code - Detecting Everything in the Open World: Towards Universal Object Detection
⭐code
👍CVPR 2023 | 标注500类,检测7000类!清华大学等提出通用目标检测算法UniDetector - STDLens: Model Hijacking-resilient Federated Learning for Object Detection
⭐code - What Can Human Sketches Do for Object Detection?
⭐code - CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
- Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects
⭐code - Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
⭐code - Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
- T-SEA: Transfer-based Self-Ensemble Attack on Object Detection
⭐code
👍CVPR 2023 | 北大提出T-SEA: 自集成策略实现更强的黑盒攻击迁移性 - 目标定位
- Open-World检测
- 3D OD
- Virtual Sparse Convolution for Multimodal 3D Object Detection
⭐code - LinK: Linear Kernel for LiDAR-based 3D Perception
⭐code - 3D Video Object Detection with Learnable Object-Centric Global Optimization
⭐code - X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
⭐code - Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving
- Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency
⭐code - Viewpoint Equivariance for Multi-View 3D Object Detection
⭐code - Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving
⭐code - Collaboration Helps Camera Overtake LiDAR in 3D Detection
⭐code
⭐code - OcTr: Octree-based Transformer for 3D Object Detection
- MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
⭐code - MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
- MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
⭐code - NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
- VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
⭐code - Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
⭐code - LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
⭐code - PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
⭐code - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
⭐code - Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection
⭐code
- Virtual Sparse Convolution for Multimodal 3D Object Detection
- 端到端目标检测
- 半监督目标检测
- 小样本目标检测
- 域适应目标检测
- 显著目标检测
- 红外目标检测
- 伪装目标检测
- 密集目标检测
- 目标发现
- 视频字幕
- Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models
⭐code - 阴影去除
- 图像恢复
- 图像质量评估
- 去雾
- 去雨
- 去噪
- 消除照片中的反射光斑
- Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervisio
- MP-Former: Mask-Piloted Transformer for Image Segmentation
⭐code - Explicit Visual Prompting for Low-Level Structure Segmentations
⭐code - Focused and Collaborative Feedback Integration for Interactive Image Segmentation
⭐code - FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
- 3D分割
- 全景分割
- 实例分割
- DynaMask: Dynamic Mask Selection for Instance Segmentation
⭐code - DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
⭐code - FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
⭐code - SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
⭐code - Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
⭐code - 弱监督实例分割
- DynaMask: Dynamic Mask Selection for Instance Segmentation
- 语义分割
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model
⭐code - Delivering Arbitrary-Modal Semantic Segmentation
⭐code - Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
- Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
- Generative Semantic Segmentation
⭐code - Reliability in Semantic Segmentation: Are We on the Right Track?
⭐code - Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
- Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
⭐code - Instant Domain Augmentation for LiDAR Semantic Segmentation
🏠[project](http://cvlab.postech.ac.kr/research/LiDomAug - Leveraging Hidden Positives for Unsupervised Semantic Segmentation
⭐code - 半监督语义分割
- 弱监督语义分割
- 点云语义分割
- 3D 语义分割
- Seg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving](https://arxiv.org/abs/2303.08600)
⭐code
- Seg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving](https://arxiv.org/abs/2303.08600)
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model
- 交互式分割
- 小样本分割
- VSS
- VOS
- InstMove: Instance Motion for Object-centric Video Segmentation
⭐code - MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
- Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
⭐code - Two-shot Video Object Segmentation
⭐code
- InstMove: Instance Motion for Object-centric Video Segmentation
- VIS
- Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
- Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
- Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
- DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
- EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization
- Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
- A Meta-Learning Approach to Predicting Performance and Data Requirements
- Multimodal Prompting with Missing Modalities for Visual Recognition
⭐code - Masked Images Are Counterfactual Samples for Robust Fine-tuning
- UniHCP: A Unified Model for Human-Centric Perceptions
⭐code - DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
⭐code - Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
- Progressive Open Space Expansion for Open-Set Model Attribution
⭐code - TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
⭐code - HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
⭐code - 3D Cinemagraphy from a Single Image
🏠project - Masked Image Modeling with Local Multi-Scale Reconstruction
⭐code - Revisiting Rotation Averaging: Uncertainties and Robust Losses
⭐code - Unifying Layout Generation with a Decoupled Diffusion Model
- Adversarial Counterfactual Visual Explanations
⭐code - Trainable Projected Gradient Method for Robust Fine-tuning
⭐code - Partial Network Cloning
⭐code - Extracting Class Activation Maps from Non-Discriminative Features as well
⭐code - TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization
⭐code - Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices
⭐code - Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark
⭐code - PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
⭐code - Boundary Unlearning
🏠project - ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals
- VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions
- BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
- Learning a Depth Covariance Function
⭐code - A Bag-of-Prototypes Representation for Dataset-Level Applications
- CrOC: Cross-View Online Clustering for Dense Visual Representation Learning
⭐code - Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels
⭐code - Marching-Primitives: Shape Abstraction from Signed Distance Function
⭐code - Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
- SIEDOB: Semantic Image Editing by Disentangling Object and Background
- Robust Test-Time Adaptation in Dynamic Scenarios
⭐code - Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck
⭐code - IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
- Compacting Binary Neural Networks by Sparse Kernel Selection
- PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
⭐code - Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
⭐code - Quantum Multi-Model Fitting
⭐code - Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
- Adaptive Spot-Guided Transformer for Consistent Local Feature Matching
⭐code
⭐code - PMatch: Paired Masked Image Modeling for Dense Geometric Matching
⭐code - ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
⭐code