本贴是对 CVPR2021 已接受论文的粗略汇总,后期会有更详细的总结。期待ing......
官网链接:http://cvpr2021.thecvf.com
开会时间:2021年6月19日-6月25日
论文接收公布时间:2021年2月28日
接收论文IDs:
- 姿态
- 三维
- 跟踪
- 光流
- 无监督
- 动作检测
- 视觉导航
- GAN
- VQA
- 未分
🎆🎆🎆更新提示:4月29日新增4篇
- Transformer
- 视频
- 未分
🎆🎆🎆更新提示:4月28日新增4篇
- 三维
- 医学
- Reid
- 分割
- Stereo Matching-立体匹配
- Depth Completion-深度完成
- UPFlow:Upsampling Pyramid for Unsupervised Optical Flow Learning
粗解:8 - Learning Optical Flow from a Few Matches
⭐code - Learning optical flow from still images
⭐code🏠project - AutoFlow: Learning a Better Training Set for Optical Flow
🏠project
- Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes
⭐code - ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows
⭐code - Lipstick ain't enough: Beyond Color Matching for In-the-Wild Makeup Transfer
- Rethinking and Improving the Robustness of Image Style Transfer
😮oral - Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
⭐code - Style-Aware Normalized Loss for Improving Arbitrary Style Transfer
😮oral
- 图像信号处理
- 光谱重建
- 因果推理算法
- 抽象时空推理算法
- Tangent Space Backpropagation for 3D Transformation Groups
⭐code - 视觉里程计
- 机器人
- Visual Room Rearrangement
😮oral🏠project📺video - GATSBI: Generative Agent-centric Spatio-temporal Object Interaction
😮oral⭐code📺video - DexYCB: A Benchmark for Capturing Hand Grasping of Objects
⭐code🏠project📺video - ContactOpt: Optimizing Contact to Improve Grasps
⭐code
机器人手抓取 - ManipulaTHOR: A Framework for Visual Object Manipulation
😮oral⭐code🏠project📺video - 视觉导航
- Visual Room Rearrangement
- AR
- Stay Positive: Non-Negative Image Synthesis for Augmented Reality
😮oral - 虚拟试穿
- Stay Positive: Non-Negative Image Synthesis for Augmented Reality
- Dynamic Slimmable Network
😮oral⭐code - Towards Evaluating and Training Verifiably Robust Neural Networks
😮oral⭐code - Activate or Not: Learning Customized Activation
粗解:4
解读:CVPR 2021 | 自适应激活函数ACON: 统一ReLU和Swish的新范式
- Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales
⭐code - Embedding Transfer with Label Relaxation for Improved Metric Learning
- Noise-resistant Deep Metric Learning with Ranking-based Instance Selection
⭐code
- Skeleton Based Sign Language Recognition Using Whole-body Keypoints
⭐code - Read and Attend: Temporal Localisation in Sign Language Videos
🏠project - Fingerspelling Detection in American Sign Language
- Deep Gaussian Scale Mixture Prior for Spectral Compressive Imaging
⭐code🏠project - Mask-ToF: Learning Microlens Masks for Flying Pixel Correction in Time-of-Flight Imaging
🏠project - Passive Inter-Photon Imaging
😮oral - Shape and Material Capture at Home
⭐code🏠project - Event-based Synthetic Aperture Imaging with a Hybrid Network
分享会 - 相机姿势
- Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
🌻dataset - Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
🏠project - Benchmarking Representation Learning for Natural World Image Collections
🌻dataset
- Spatially-Adaptive Pixelwise Networks for Fast Image Translation
🏠project
采用超网络和隐式函数,极快的图像到图像翻译速度(比基线快18倍) - Image Generators with Conditionally-Independent Pixel Synthesis
😮oral⭐code
- Im2Vec: Synthesizing Vector Graphics without Vector Supervision
😮oral⭐code🏠project - Context-Aware Layout to Image Generation with Enhanced Object Appearance
⭐code - Adversarial Generation of Continuous Images
⭐code - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis
- IMAGINE: Image Synthesis by Image-Guided Model Inversion
- Leveraging Line-point Consistence to Preserve Structures forWide Parallax Image Stitching
⭐code
分享会
- AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries
⭐code
解读:CVPR 2021接收论文:AdCo基于对抗的对比学习 - LAFEAT: Piercing Through Adversarial Defenses with Latent Features
😮oral - Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
⭐code
- Fourier Contour Embedding for Arbitrary-Shaped Text Detection
- 场景文本检测
- What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels
- Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
😮oral⭐code - MOST: A Multi-Oriented Scene Text Detector with Localization Refinement
- Scene Text Retrieval via Joint Text Detection and Similarity Learning
⭐code
- 手写文本识别
- Simulating Unknown Target Models for Query-Efficient Black-box Attacks
⭐code
黑盒对抗攻击 - Delving into Data: Effectively Substitute Training for Black-box Attack
基于高效训练替代模型的黑盒攻击方法
解读:8 - LiBRe: A Practical Bayesian Approach to Adversarial Detection
⭐code
- Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
- Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
- Learning Asynchronous and Sparse Human-Object Interaction in Videos
- QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
⭐code - Reformulating HOI Detection as Adaptive Set Prediction
⭐code
- Detecting Human-Object Interaction via Fabricated Compositional Learning
⭐code - Affordance Transfer Learning for Human-Object Interaction Detection
⭐code - Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection
⭐code
- Robust Neural Routing Through Space Partitions for Camera Relocalization in Dynamic Indoor Environments
😮oral - Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
⭐code - Learning Camera Localization via Dense Scene Matching
⭐code
- Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
⭐code🏠project📺video - VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
视频字幕、视频问答和视频对话任务的多模式框架 - Open-book Video Captioning with Retrieve-Copy-Generate Network
- VirTex: Learning Visual Representations from Textual Annotations
⭐code - Exploring Simple Siamese Representation Learning
😮oral
- IIRC: Incremental Implicitly-Refined Classification
🏠project - Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning
- DER: Dynamically Expandable Representation for Class Incremental Learning
⭐code - Few-Shot Incremental Learning with Continually Evolved Classifiers
- Rainbow Memory: Continual Learning with a Memory of Diverse Samples
- Training Networks in Null Space for Continual Learning
😮oral⭐code
- Efficient Feature Transformations for Discriminative and Generative Continual Learning
- Rainbow Memory: Continual Learning with a Memory of Diverse Samples
- Rectification-based Knowledge Retention for Continual Learning
- Coarse-Fine Networks for Temporal Activity Detection in Videos
- 3D CNNs with Adaptive Temporal Feature Resolutions
- Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack
- BASAR:Black-box Attack on Skeletal Action Recognition
📺video - TDN: Temporal Difference Networks for Efficient Action Recognition
⭐code - ACTION-Net: Multipath Excitation for Action Recognition
⭐code
解读:CVPR 2021 | 用于动作识别,即插即用、混合注意力机制的 ACTION 模块
解读:CVPR 2021 |针对强时序依赖,即插即用、混合注意力机制的 ACTION 模块 - No frame left behind: Full Video Action Recognition
- Recognizing Actions in Videos from Unseen Viewpoints
- Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories
- Motion Representations for Articulated Animation
⭐code🏠project📺video - 时序动作定位
- Modeling Multi-Label Action Dependencies for Temporal Action Localization
😮oral
提出基于注意力的网络架构来学习视频中的动作依赖性,用于解决多标签时间动作定位任务。 - Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
基于显著边界特征学习的无锚框时序动作定位
解读:10 - The Blessings of Unlabeled Background in Untrimmed Videos
- Temporal Context Aggregation Network for Temporal Action Proposal Refinement
- Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
- CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
- Action Unit Memory Network for Weakly Supervised Temporal Action Localization
- Modeling Multi-Label Action Dependencies for Temporal Action Localization
- Improving Unsupervised Image Clustering With Robust Learning
⭐code
利用鲁棒学习改进无监督图像聚类技术 - Jigsaw Clustering for Unsupervised Visual Representation Learning
😮oral⭐code
- Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels
⭐code - Differentiable Patch Selection for Image Recognition
⭐code
- 细粒度分类
- Fine-grained Angular Contrastive Learning with Coarse Labels
😮oral
使用自监督进行 Coarse Labels(粗标签)的细粒度分类方面的工作。粗标签与细粒度标签相比,更容易和更便宜,因为细粒度标签通常需要域专家。 - Graph-based High-Order Relation Discovery for Fine-grained Recognition
基于特征间高阶关系挖掘的细粒度识别方法
解读:20 - Fine-Grained Few-Shot Classification with Feature Map Reconstruction Networks
- A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification
😮oral
- Fine-grained Angular Contrastive Learning with Coarse Labels
- 图像分类
- MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
- PML: Progressive Margin Loss for Long-tailed Age Classification
- Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
🏠project - Capsule Network is Not More Robust than Convolutional Network
- Model-Contrastive Federated Learning
- Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
😮oral⭐code🏠project
- 半监督图像分类
- 长尾视觉识别
- FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
😮oral⭐code - GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
⭐code - FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
😮oral⭐code - Wide-Depth-Range 6D Object Pose Estimation in Space
⭐code - DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency
- ID-Unet: Iterative Soft and Hard Deformation for View Synthesis
😮oral - NeX: Real-time View Synthesis with Neural Basis Expansion
😮oral🏠project📺video
利用神经基础扩展的实时视图合成技术 - Layout-Guided Novel View Synthesis from a Single Indoor Panorama
⭐code - Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
- Counterfactual Zero-Shot and Open-Set Visual Recognition
⭐code - Few-shot Open-set Recognition by Transformation Consistency
- Learning Placeholders for Open-Set Recognition
😮oral
- Neural Lumigraph Rendering
🌻dataset🏠project📺video
斯坦福大学 - AutoInt: Automatic Integration for Fast Neural Volume Rendering
😮oral🏠project📺video
斯坦福大学 - pixelNeRF: Neural Radiance Fields from One or Few Images
⭐code🏠project📺video - IBRNet: Learning Multi-View Image-Based Rendering
🏠project
备注:有学者评论pixelNeRF和IBRNet的工作**相近,但IBRNet似乎更加成熟。 - Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans
⭐code🏠project📺video
浙大等学者发明的Neural Body算法,输入多角度视频可输出3D人体和新角度视图。 - NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis
🏠project📺video
在任意照明条件下,根据一组输入图像生成完整的3D场景 - Self-Supervised Visibility Learning for Novel View Synthesis
⭐code
- Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration
⭐code - Monocular Real-time Full Body Capture with Inter-part Correlations
📺video
在电影动作特效中,人体运动捕捉是关键技术,高质量的捕捉往往需要特殊设备,而如果能使用普通RGB相机进行运动捕捉,将会使人人都是特效师。该视频来自清华、马普所等单位的学者发表于CVPR2021的论文结果,使用单目RGB相机的动作捕捉。 - Behavior-Driven Synthesis of Human Dynamics
⭐code🏠project - Learning Compositional Representation for 4D Captures with Neural ODE
- Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation
⭐code
粗解:2 - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression
⭐code - SCANimate: Weakly Supervised Learning of Skinned Clothed Avatar Networks
😮oral🏠project - On Self-Contact and Human Pose
🏠project - Lite-HRNet: A Lightweight High-Resolution Network
⭐code
解读:Lite-HRNet:轻量级HRNet,FLOPs大幅下降 - Deep Dual Consecutive Network for Human Pose Estimation
- 3D Human Action Representation Learning via Cross-View Consistency Pursuit
⭐code
- 3D手部重建
- 人体运动迁移
- Human Volumetric Capture
- 3D人体姿态估计
- CanonPose: Self-supervised Monocular 3D Human Pose Estimation in the Wild
- Context Modeling in 3D Human Pose Estimation: A Unified Perspective
- PCLs: Geometry-aware Neural Reconstruction of 3D Pose with Perspective Crop Layers
📺video
通过消除 location-dependent 透视效果来改进3D人体姿势估计技术工作。 - Graph Stacked Hourglass Networks for 3D Human Pose Estimation
- Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors
😮oral🏠project - SimPoE: Simulated Character Control for 3D Human Pose Estimation
😮oral🏠project - Reconstructing 3D Human Pose by Watching Humans in the Mirror
😮oral⭐code🏠project - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo
⭐code
- 动物姿态估计
- 3D人体网格配准
- 多人人体重建
- 3D人体运动
- Densely connected multidilated convolutional networks for dense prediction tasks
提出的D3Net在语义分割&音乐源分离任务上的表现优于SOTA网络 - Dense Contrastive Learning for Self-Supervised Visual Pre-Training
😮oral⭐code
- Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
⭐code
- Skip-Convolutions for Efficient Video Processing
- VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
⭐code - Learning by Aligning Videos in Time
- Hierarchical Motion Understanding via Motion Programs
🏠project📺video - 视频摘要
- 视频编解码
- 视频插帧
- 视频语言学习(video-and-language learning)
- 视频预测
- 视频理解
- Context-aware Biaffine Localizing Network for Temporal Sentence Grounding
⭐code - Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
🏠project - Visual Semantic Role Labeling for Video Understanding
🏠project - Temporal Query Networks for Fine-grained Video Understanding
😮oral🏠project - Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
- FrameExit: Conditional Early Exiting for Efficient Video Recognition
😮oral
- Context-aware Biaffine Localizing Network for Temporal Sentence Grounding
- 视频缩放
- 视频异常检测
- 视频声源定位
- 视频分析
- 视频生成
- Playable Video Generation
😮oral⭐code🏠project📺video - One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
😮oral⭐code🏠project📺video
解读:颠覆视频压缩的不一定是新压缩算法,而可能是GAN!英伟达新算法最高压缩90%流量
Nvidia的新研究,使用人脸关键点+GAN重建视频通话,相比传统的H.264节省90%流量。代码未开源,但英伟达的GAN框架开源了。
- Playable Video Generation
- 视频视角切换
- A Deep Emulator for Secondary Motion of 3D Characters
😮oral - Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction
😮oral🏠project📺video - Deep Implicit Templates for 3D Shape Representation
😮oral⭐code🏠project📺video
CVPR 2021 Oral,清华学者提出Deep Implicit Templates,极大扩展DIF能力 - SMPLicit: Topology-aware Generative Model for Clothed People
🏠project - Picasso: A CUDA-based Library for Deep Learning over 3D Meshes
⭐code - Semi-supervised Synthesis of High-Resolution Editable Textures for 3D Humans
- RGB-D Local Implicit Function for Depth Completion of Transparent Objects
🏠project - Deep Two-View Structure-from-Motion Revisited
- Deformed Implicit Field: Modeling 3D Shapes with Learned Dense Correspondence
- S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
- 深度估计
- PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss
- Beyond Image to Depth: Improving Depth Prediction using Echoes
⭐code🏠project - Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
😮oral⭐code🏠project📺video - 3D Packing for Self-Supervised Monocular Depth Estimation
😮oral⭐code - LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering
😮oral⭐code🏠project - S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
😮oral - Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries
⭐code - Self-supervised Learning of Depth Inference for Multi-view Stereo
⭐code - SMD-Nets: Stereo Mixture Density Networks
⭐code - The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth
- 三维重建
- Deep Implicit Moving Least-Squares Functions for 3D Reconstruction
⭐code - Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction
🏠project - Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction
⭐code - Fostering Generalization in Single-view 3D Reconstruction by Learning a Hierarchy of Local and Global Shape Priors
- NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
😮oral⭐code🏠project - Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
- CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo
😮oral - SCALE: Modeling Clothed Humans with a Surface Codec of Articulated Local Elements
🏠project📺video
- Deep Implicit Moving Least-Squares Functions for 3D Reconstruction
- 语义场景补全
- 三维关键点
- 三维形状补全
- Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph
⭐code🏠project - Unsupervised Learning for Robust Fitting:A Reinforcement Learning Approach
- Unsupervised Visual Attention and Invariance for Reinforcement Learning
- Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
⭐code
ECCV 2020 Facebook Mapillary Visual Place Recognition Challenge 冠军方案 - AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
- Self-Supervised Pillar Motion Learning for Autonomous Driving
⭐code
- 车道线预测
- 轨迹预测
- 3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management
用纯多模态 CT 影像可替代目前 JHMI 的需要做肿瘤化学检测和 DNA 测序+医学影像的综合多模态诊断流程,从诊断准确度上有可比较性,定量诊断精度更优 - Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies
肿瘤影像里面智能 PACS 辅助医生读片的重要功能 - Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization
基于CT 影像的骨折/骨质疏松系统 - Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
⭐code
多机构合作,利用联合学习改进基于深度学习的磁共振图像重建技术 - DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images
😮oral⭐code
DeepTag: 一种无监督的深度学习方法,用于心脏标记磁共振图像的运动跟踪 - Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles
- XProtoNet: Diagnosis in Chest Radiography with Global and Local Explanations
- 医学图像分割
- FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
⭐code - DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets
⭐code🌻dataset - [DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation]
😮oral(https://arxiv.org/abs/2103.15954) - DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images
- Every Annotation Counts: Multi-label Deep Supervision for Medical Image Segmentation
- FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
- 医学图像合成
- Transformer Interpretability Beyond Attention Visualization
⭐code - MIST: Multiple Instance Spatial Transformer Network
试图从热图中进行可微的top-K选择(MIST)(目前在自然图像上也有了一些结果;) 用它可以在没有任何定位监督的情况下进行检测和分类(并不是它唯一能做的事情!) - Variational Transformer Networks for Layout Generation
- 动作识别检测
- 3D Vision Transformers for Action Recognition
用于动作识别的3D视觉Transformer
- 3D Vision Transformers for Action Recognition
- 目标检测
- 图像处理
- 人机交互
- 图像分割
- 跟踪
- 动作预测
- Self-attention自注意力机制
- 检索
- 特征匹配
- 姿势识别
- 自动驾驶
- Meta Batch-Instance Normalization for Generalizable Person Re-Identification
- Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification
- Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification
⭐code - [Self-supervised 3D Reconstruction and Re-Projection for Texture Insensitive Person Re-identification]
基于自监督三维重建和重投影的纹理不敏感行人重识别
解读:12 - Intra-Inter Camera Similarity for Unsupervised Person Re-Identification
⭐code
论文公开 - Anchor-Free Person Search
⭐code
- Lifelong Person Re-Identification via Adaptive Knowledge Accumulation
⭐code - Group-aware Label Transfer for Domain Adaptive Person Re-identification
⭐code|code - Neural Feature Search for RGB-Infrared Person Re-Identification
- Combined Depth Space based Architecture Search For Person Re-identification
- Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
😮oral - 拥挤人群计数
- Learning Student Networks in the Wild
- ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
⭐code - RepVGG: Making VGG-style ConvNets Great Again
⭐code - Coordinate Attention for Efficient Mobile Network Design
⭐code
- 剪枝
- 模型扩展
- 量化
- 知识蒸馏
- 可逆神经网络
- 模型压缩
- Dogfight: Detecting Drones from Drone Videos
- UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
- 航空影像分割
- 航空影像检测
- 无人机检测
- 多视角卫星摄影测量
- Data-Free Knowledge Distillation For Image Super-Resolution
- AdderSR: Towards Energy Efficient Image Super-Resolution
⭐code - Cross-MPI: Cross-scale Stereo for Image Super-Resolution using Multiplane Images
🏠project📺video
CVPR 2021,Cross-MPI以底层场景结构为线索的端到端网络,在大分辨率(x8)差距下也可完成高保真的超分辨率 - ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
⭐code
- Robust Reference-based Super-Resolution via C²-Matching
- GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
😮oral🏠project - BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
⭐code🏠project - [Temporal Modulation Network for Controllable Space-Time Video Super-Resolution]
作者主页
基于时空特征可控插值的视频超分辨率网络
解读:18 - Flow-based Kernel Prior with Application to Blind Super-Resolution
- Unsupervised Degradation Representation Learning for Blind Super-Resolution
⭐code - SRWarp: Generalized Image Super-Resolution under Arbitrary Transformation
⭐code
- Weakly-supervised Grounded Visual Question Answering using Capsules
- Counterfactual VQA: A Cause-Effect Look at Language Bias
⭐code - AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
- Domain-robust VQA with diverse datasets and methods but no target labels
- 视频问答
- Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs
- Image-to-image Translation via Hierarchical Style Disentanglement
⭐code - Efficient Conditional GAN Transfer with Knowledge Propagation across Classes
⭐code - Anycost GANs for Interactive Image Synthesis and Editing
⭐code🏠project📺video
Anycost GAN,可适应广泛的硬件和延迟要求,以及实现交互式图像编辑 - TediGAN: Text-Guided Diverse Image Generation and Manipulation
⭐code🏠project📺video - Generative Hierarchical Features from Synthesizing Images
😮oral⭐code🏠project
作者称预训练 GAN 生成器可以当作是一种学习的多尺度损失。用它进行训练可以带来高度竞争的层次化和分离的视觉特征,称之为生成层次化特征(GH-Feat)。并进一步表明,GH-Feat不仅有利于生成性任务,更重要的是有利于分辨性任务,包括人脸验证、关键点检测、layout prediction、迁移学习、style mixing、图像编辑等。 - Teachers Do More Than Teach: Compressing Image-to-Image Models
- PISE: Person Image Synthesis and Editing with Decoupled GAN
⭐code - LOHO: Latent Optimization of Hairstyles via Orthogonalization
- HumanGAN: A Generative Model of Humans Images
- HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms
⭐code - DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
⭐code
- pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
😮oral🏠project📺video
更多:斯坦福学者提出周期性隐式生成对抗网络(π-GAN或pi-GAN),用于高质量的3D感知图像合成
斯坦福大学 - ReMix: Towards Image-to-Image Translation with Limited Data
- Unsupervised Disentanglement of Linear-Encoded Facial Semantics
- Content-Aware GAN Compression
- Regularizing Generative Adversarial Networks under Limited Data
⭐code🏠project - Where and What? Examining Interpretable Disentangled Representations
⭐code - Few-shot Image Generation via Cross-domain Correspondence
🏠project - DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort
😮oral - Surrogate Gradient Field for Latent Space Manipulation
- StylePeople: A Generative Model of Fullbody Human Avatars
- Ensembling with Deep Generative Views
⭐code🏠project - 图像到图像翻译
- 小样本学习
- 域泛化
- FSDR: Frequency Space Domain Randomization for Domain Generalization
受 JPEG 将空间图像转换为多个频率分量(FCs)的启发,提出频率空间域随机化(FSDR),通过保留域变量FCs(DIFs)和只随机化域变量FCs(DVFs)来随机化频率空间的图像。 - Domain Generalization via Inference-time Label-Preserving Target Projections
- Adaptive Methods for Real-World Domain Generalization
😮 Oral - Progressive Domain Expansion Network for Single Domain Generalization
⭐code
- FSDR: Frequency Space Domain Randomization for Domain Generalization
- 零样本学习
- 域适应
- Dynamic Transfer for Multi-Source Domain Adaptation
⭐code - Transferable Semantic Augmentation for Domain Adaptation
- MetaAlign: Coordinating Domain Alignment and Classification for Unsupervised Domain Adaptation
- DRANet: Disentangling Representation and Adaptation Networks for Unsupervised Cross-Domain Adaptation
- Dynamic Domain Adaptation for Efficient Inference
⭐code - Prototypical Cross-domain Self-supervised Learning for Few-shot Unsupervised Domain Adaptation
🏠project - Domain Consensus Clustering for Universal Domain Adaptation
⭐code - Divergence Optimization for Noisy Universal Domain Adaptation
- Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation
⭐code🏠project - Instance Level Affinity-Based Transfer for Unsupervised Domain Adaptation
⭐code - Unsupervised Multi-source Domain Adaptation Without Access to Source Data
- Domain Adaptation with Auxiliary Target Domain-Oriented Classifier
⭐code - Cross-Domain Adaptive Clustering for Semi-Supervised Domain Adaptation
- Dynamic Transfer for Multi-Source Domain Adaptation
- Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
- Convolutional Hough Matching
😮oral🏠project - T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
- 图像检索
- 视频检索
- 图像恢复Image Restoration
- 去阴影Shadow Removal
- 去模糊Deblurring
- 去反射Reflection Removal
- 去雾
- Learning to Restore Hazy Video: A New Real-World Dataset and A New Method
学习复原有雾视频:一种新的真实数据集及算法
解读:9 - Contrastive Learning for Compact Single Image Dehazing
⭐code
基于对比学习的紧凑图像去雾方法
解读:5
- Learning to Restore Hazy Video: A New Real-World Dataset and A New Method
- 去噪Denoising
- 去雨Deraining
- 曝光校正
- 图像修复Image Inpainting
- 图像编辑
- 图像压缩
- Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton
- Slimmable Compressive Autoencoders for Practical Neural Image Compression
⭐code - Checkerboard Context Model for Efficient Learned Image Compression
- Learning Scalable ℓ∞-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression
⭐code - Deep Homography for Efficient Stereo Image Compression
⭐code
分享会
- de-rendering
- 视频修复
- Progressive Temporal Feature Alignment Network for Video Inpainting
⭐code
作者提出 Progressive Temporal Feature Alignment Network,利用光流从相邻帧中提取的特征逐步丰富当前帧的特征。纠正了时空特征传播阶段的 spatial misalignment,极大地提高了 inpainted videos 的视觉质量和时空一致性。在 DAVIS 和 FVI 数据集上实现了与现有深度学习方法相比的最先进性能。
- Progressive Temporal Feature Alignment Network for Video Inpainting
- 消除图像伪影
- 图像对齐
- Towards High Fidelity Face Relighting with Realistic Shadows
⭐code - IronMask: Modular Architecture for Protecting Deep Face Template
- Everything's Talkin': Pareidolia Face Reenactment
- 人脸识别
- A 3D GAN for Improved Large-pose Facial Recognition
- When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
😮oral⭐code - MagFace: A Universal Representation for Face Recognition and Quality Assessment
😮oral⭐code
人脸识别+质量,今年的Oral presentation。 代码待整理 - WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
🏠project - ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis
😮oral🏠project📺video - Spherical Confidence Learning for Face Recognition
😮oral
基于超球流形置信度学习的人脸识别 - Consistent Instance False Positive Improves Fairness in Face Recognition
基于实例误报一致性的人脸识别公平性提升方法
解读:7 - CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
- Cross-Domain Similarity Learning for Face Recognition in Unseen Domains
- HLA-Face: Joint High-Low Adaptation for Low Light Face Detection
🏠project - FACESEC: A Fine-grained Robustness Evaluation Framework for Face Recognition Systems
- A 3D GAN for Improved Large-pose Facial Recognition
- 合成人脸(Deepfake/Face Forgery)检测
- Multi-attentional Deepfake Detection
- Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection
- MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes
- Face Forensics in the Wild
😮oral⭐code - Improving the Efficiency and Robustness of Deepfakes Detection through Precise Geometric Features
⭐code
- Multi-attentional Deepfake Detection
- 人脸质量评估
- 3D人脸重建
- Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection
😮oral
在开放的人像集合中学习3D人脸的聚合与特异化重建 - 3DCaricShop: A Dataset and A Baseline Method for Single-view 3D Caricature Face Reconstruction
⭐code🏠project - Riggable 3D Face Reconstruction via In-Network Optimization
⭐code
本文通过一个嵌入了网络内优化的端到端可训练网络,解决了从单目 RGB 图像中 riggable 3D 人脸重建。并且达到了最先进的重建精度,合理的鲁棒性和泛化能力,可以应用于标准的 face rig 应用,如重定位。 - Pixel Codec Avatars
😮oral
- Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection
- 人脸表情识别
- Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
- Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty Estimation for Facial Expression Recognition
- Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition
- Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
- 人脸聚类
- 人脸编辑
- 人脸跟踪
- 广角人脸矫正
- 人脸活体检测
- 音频驱动合成赋有情感的人脸
- 换脸
- Information Bottleneck Disentanglement for Identity Swapping
分享会
- Information Bottleneck Disentanglement for Identity Swapping
- 人脸修复
- FaceInpainter: High Fidelity Face Adaptation to Heterogeneous Domains
分享会
- FaceInpainter: High Fidelity Face Adaptation to Heterogeneous Domains
- 人脸动画
- AttentiveNAS: Improving Neural Architecture Search via Attentive
- HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens
- ReNAS: Relativistic Evaluation of Neural Architecture Search
- OPANAS: One-Shot Path Aggregation Network Architecture Search for Object
- Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search
北京大学人工智能研究院机器学习研究中心 - Contrastive Neural Architecture Search with Neural Architecture Comparators
⭐code - Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator
⭐code - Prioritized Architecture Sampling with Monto-Carlo Tree Search
⭐code
- One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking
⭐code - NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization
🏠project - Neural Architecture Search with Random Labels
粗解:1 - Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search
⭐code
- Rotation Equivariant Siamese Networks for Tracking
- Multiple Object Tracking with Correlation Learning
提出 CorrTracker,一个统一的关联跟踪器,可以密集建模目标之间的关联,并通过关联传递信息。在 MOT17 上获得最先进的 MOTA 76.5% 和 IDF1 73.6%。 - LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
⭐code
- 多目标跟踪
- Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
- Track to Detect and Segment: An Online Multi-Object Tracker
⭐code🏠project📺video
TraDeS :CVPR 2021多目标跟踪算法,改进了目前联合检测与跟踪的在线方法,使用跟踪线索辅助检测,在多个数据集实现了大幅精度提升,作者来自纽约州立大学。代码已开源。 - Multiple Object Tracking with Correlation Learning
- Learning a Proposal Classifier for Multiple Object Tracking
⭐code - Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking
⭐code - Online Multiple Object Tracking with Cross-Task Synergy
⭐code
- Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
- 视觉目标跟踪
- 单目标跟踪
- Information-Theoretic Segmentation by Inpainting Error Maximization
- Simultaneously Localize, Segment and Rank the Camouflaged Objects
⭐code - Capturing Omni-Range Context for Omnidirectional Segmentation
⭐code - Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
⭐code🏠project
- Locate then Segment: A Strong Pipeline for Referring Image Segmentation
- InverseForm: A Loss Function for Structured Boundary-Aware Segmentation
😮oral - 实例分割
- Zero-Shot Instance Segmentation
⭐code
创新奇智首次提出零样本实例分割,助力解决工业场景数据瓶颈难题 - Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
⭐code - Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency
- FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter
- Weakly-supervised Instance Segmentation via Class-agnostic Learning with Salient Images
⭐code - Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation
⭐code - RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
⭐code
- Zero-Shot Instance Segmentation
- 全景分割
- 4D Panoptic LiDAR Segmentation
- Cross-View Regularization for Domain Adaptive Panoptic Segmentation
😮oral
用于域自适应全景分割的跨视图正则化方法 - Part-aware Panoptic Segmentation
- Toward Joint Thing-and-Stuff Mining for Weakly Supervised Panoptic Segmentation
联合物体和物质挖掘的弱监督全景分割
解读:15 - Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation
⭐code - Fully Convolutional Networks for Panoptic Segmentation
😮oral⭐code
粗解:11 - Panoptic Segmentation Forecasting
- 4D Panoptic LiDAR Segmentation
- 语义分割
- PLOP: Learning without Forgetting for Continual Semantic Segmentation
⭐code - Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
🌻dataset📺video - Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
- Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
- Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
😮oral⭐code - Learning Statistical Texture for Semantic Segmentation
- MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
⭐code
语义分割中的无监督域适应的域感知元损失校正 - Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
- Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
⭐code - Rethinking BiSeNet For Real-time Semantic Segmentation
⭐code - BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
⭐code - Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation
⭐code - Cross-Dataset Collaborative Learning for Semantic Segmentation
- Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
- Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
⭐code - Source-Free Domain Adaptation for Semantic Segmentation
- PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering
- Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
🏠project - Progressive Semantic Segmentation
- Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
🏠project - DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation
😮oral⭐code
- PLOP: Learning without Forgetting for Continual Semantic Segmentation
- 场景理解/场景解析
- Exploring Data Efficient 3D Scene Understanding with Contrastive Scene Contexts
😮oral🏠project📺video - Monte Carlo Scene Search for 3D Scene Understanding
- Bidirectional Projection Network for Cross Dimension Scene Understanding
😮oral⭐code - RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
😮oral⭐code - CoCoNets: Continuous Contrastive 3D Scene Representations
🏠project📺video
来自CMU的学者提出一种3D场景表示方法,利用自监督对比学习和输入的RGB与RGBD场景数据学习而来,这种特征表示方法在目标跟踪、检测等下游任务中表现出良好的性能。 - 场景图合成/分析
- SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
🏠project - Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation
场景图生成---场景解析 - Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis
🏠project
利用面向边缘的推理进行基于3D点的场景图分析---场景理解 - Fully Convolutional Scene Graph Generation
😮oral - Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation
⭐code
- SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
- Exploring Data Efficient 3D Scene Understanding with Contrastive Scene Contexts
- 抠图
- Real-Time High Resolution Background Matting
⭐code🏠project📺video
最新开源抠图技术,实时快速高分辨率,4k(30fps)、现代GPU(60fps)
解读:单块GPU实现4K分辨率每秒30帧,华盛顿大学实时视频抠图再升级,毛发细节到位
最新开源抠图技术,实时快速高分辨率,4k(30fps)、现代GPU(60fps)
- Real-Time High Resolution Background Matting
- 动作分割
- 时序动作分割
- 无监督动作分割
- 监督动作分割
- Anchor-Constrained Viterbi for Set-Supervised Action Segmentation
- 视频动作分割
- Global2Local: Efficient Structure Search for Video Action Segmentation
从全局到局部:面向视频动作分割的高效网络结构搜索
解读:19
- Global2Local: Efficient Structure Search for Video Action Segmentation
- 雷达分割
- Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
😮oral⭐code
在 SemanticKITTI 榜单排名第一(until CVPR DDL),在 nuScenes 中获得 SOTA,并对其他基于激光雷达的任务保持了良好的泛化能力,包括激光雷达全景分割和激光雷达三维检测,其中就基于此工作,在 SemanticKITTI 全景分割榜单也排名第一。
- Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
- 视频目标分割
- Modular Interactive Video Object Segmentation:Interaction-to-Mask, Propagation and Difference-Aware Fusion
😮oral⭐code🏠project📺video - Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
⭐code - Efficient Regional Memory Network for Video Object Segmentation
⭐code🏠project - Learning Position and Target Consistency for Memory-based Video Object Segmentation
在 DAVIS 和 Youtube-VOS 基准上都达到了最先进的性能,并在 DAVIS 2020 挑战半监督 VOS 任务中排名第一。 - Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps
😮oral⭐code - 视频多目标分割
- Modular Interactive Video Object Segmentation:Interaction-to-Mask, Propagation and Difference-Aware Fusion
- 视频实例分割
- SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation
⭐code📺video
文章介绍一个简单有效的单阶段框架:SG-Net,与传统的两阶段框架相比,可以有效提高掩码质量和推理速度。 - Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
⭐code
- SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation
- 小样本分割
- 伪装目标分割
- 视频抠图
- Multiple Instance Active Learning for Object Detection
⭐code - Positive-Unlabeled Data Purification in the Wild for Object Detection
- Depth from Camera Motion and Object Detection
⭐github📺video
通过使用“普通手机摄像头运动+目标检测的包围框”数据,设计RNN网络实现了达到最先进精度的目标深度估计。 - Towards Open World Object Detection
😮oral⭐code - General Instance Distillation for Object Detection
近年来,知识蒸馏已被证明是模型压缩的有效解决方案。可以使轻量级的学生模型获得从繁琐的教师模型中提取的知识,但以往的检测蒸馏方法对于不同的检测框架的泛化能力较弱,而且严重依赖ground truth(GT),忽略了实例之间有价值的关系信息。为此,作者在本文中提出新的基于判别性实例的检测任务蒸馏方法,不考虑 GT 区分的正负,命名为通用实例蒸馏(GID)。该方法包含一个通用实例选择模块(GISM),可以充分利用基于特征、基于关系和基于响应的知识进行蒸馏。实验验证,学生模型在各种检测框架中可以实现显著的 AP 改进,甚至优于教师模型。具体来说,RetinaNet 与 ResNet-50 在 COCO 数据集上用 GID 实现了39.1% 的 mAP,比基线 36.2% 超出了 2.9%,甚至优于基于 ResNet-101 的教师模型 38.1% 的 AP。 - Distilling Object Detectors via Decoupled Features
- MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
- Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection
😮oral
- You Only Look One-level Feature
⭐code
开源 YOLOF,无需 FPN,速度比 YOLOv4 快13%
解读:目标检测算法YOLOF:You Only Look One-level Feature
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
⭐code - End-to-End Object Detection with Fully Convolutional Network
⭐code
解读:丢弃Transformer,FCN也可以实现E2E检测 - Robust and Accurate Object Detection via Adversarial Learning
- I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
- Distilling Object Detectors via Decoupled Features
⭐code - OTA: Optimal Transport Assignment for Object Detection
⭐code - Scale-aware Automatic Augmentation for Object Detection
⭐code - A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection
😮oral🏠project - Group Collaborative Learning for Co-Salient Object Detection
⭐code - IQDet: Instance-wise Quality Distribution Sampling for Object Detection
粗解:20 - SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud
⭐code - 小样本目标检测
- Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
首个研究少样本检测任务的语义关系推理,并证明它可提升强基线的潜。 - Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
⭐code
北京大学人工智能研究院机器学习研究中心 - FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding
⭐code - [Generalized Few-Shot Object Detection without Forgetting]
粗解:16
- Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
- 多目标检测
- 3D目标检测
- Categorical Depth Distribution Network for Monocular 3D Object Detection
😮oral - 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection
⭐code🏠project📺video
更多:CVPR 2021|利用IoU预测进行半监督式3D目标检测 - ST3D: Self-training for Unsupervised Domain Adaptation on 3D ObjectDetection
⭐code - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection
⭐code - MonoRUn: Monocular 3D Object Detection by Self-Supervised Reconstruction and Uncertainty Propagation
⭐code - M3DSSD: Monocular 3D Single Stage Object Detector
⭐code - GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection
⭐code📺video
作者提出并集成GrooMeD-NMS--一种新颖的分组数学可区分的NMS,用于单眼3D物体检测,在KITTI基准数据集上实现了最先进的单眼3D物体检测结果,表现与基于单眼视频的方法相当。 - LiDAR R-CNN: An Efficient and Universal 3D Object Detector
⭐code - Exploring intermediate representation for monocular vehicle pose estimation
⭐code - Delving into Localization Errors for Monocular 3D Object Detection
⭐code - HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
- Objects are Different: Flexible Monocular 3D Object Detection
⭐code - Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds
⭐code - PointAugmenting: Cross-Modal Augmentation for 3D Object Detection
分享会
- Categorical Depth Distribution Network for Monocular 3D Object Detection
- 旋转目标检测
- 目标定位
- 密集目标检测
- 显著目标检测
- 半监督目标检测
- Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection
- [Points as Queries: Weakly Semi-supervised Object Detection by Points]
粗解:6 - 弱监督目标检测
- 长尾目标检测
- 弱监督
- 半监督
- 自监督
- Self-supervised Geometric Perception
😮oral⭐code
作者称 SGP 是第一个在几何感知中进行特征学习的通用框架,不需要任何来自 ground-truth 几何标签的监督。SGP以EM方式运行,它迭代执行几何模型的鲁棒估计以生成伪标签,并在噪声伪标签的监督下进行特征学习。将 SGP 应用于相机姿势估计和点云配准,并证明在大规模真实数据集中,SGP 的性能等同于甚至优于监督的权威。 - Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting
⭐code - Self-supervised Motion Learning from Static Images
- SOLD2: Self-supervised Occlusion-aware Line Description and Detection
😮oral⭐code - All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training
⭐code - Global Transport for Fluid Reconstruction with Learned Self-Supervision
😮oral⭐code
- Self-supervised Geometric Perception
- 无监督
- Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
- MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization
😮oral⭐code - TPCN: Temporal Point Cloud Networks for Motion Forecasting
用于运动预测的时空点云网络 - How Privacy-Preserving are Line Clouds? Recovering Scene Details from 3D Lines
⭐code - PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
⭐code - Point2Skeleton: Learning Skeletal Representations from Point Clouds
😮oral⭐code🏠project - FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
- RPSRNet: End-to-End Trainable Rigid Point Set Registration Network using Barnes-Hut 2D-Tree Representation
- 点云配准
- PREDATOR: Registration of 3D Point Clouds with Low Overlap
😮oral⭐code🏠project - SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration
⭐code - Robust Point Cloud Registration Framework Based on Deep Graph Matching
⭐code - PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency
⭐code - ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning
⭐code - DeepI2P: Image-to-Point Cloud Registration via Deep Classification
⭐code
- PREDATOR: Registration of 3D Point Clouds with Low Overlap
- 点云补全
- 点云关键点检测
- 3D点云
- Sequential Graph Convolutional Network for Active Learning
- Quantifying Explainers of Graph Neural Networks in Computational Pathology
- Binary Graph Neural Networks
- Inverting the Inherence of Convolution for Visual Recognition
- Representative Batch Normalization with Feature Calibration
- UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pretraining
- Reconsidering Representation Alignment for Multi-view Clustering
- Self-supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map
- Instance Localization for Self-supervised Detection Pretraining
⭐code - Model-Contrastive Federated Learning
提出模型对比学习来解决联合学习中的非IID数据问题 - Neural Geometric Level of Detail:Real-time Rendering with Implicit 3D Surfaces
😮Oral⭐code🏠project - Data-Free Model Extraction
⭐code - Single-Stage Instance Shadow Detection with Bidirectional Relation Learning
😮oral⭐code - Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning
😮oral - PatchmatchNet: Learned Multi-View Patchmatch Stereo
😮oral⭐code - [Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning]
- [Semantic Palette: Guiding Scene Generation with Class Proportions]
- Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors
😮oral - POSEFusion:Pose-guided Selective Fusion for Single-view Human Volumetric Capture
😮oral - Multi-Objective Interpolation Training for Robustness to Label Noise
⭐code - Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations
⭐code - Simpler Certified Radius Maximization by Propagating Covariances
😮oral📺video - Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
- Discovering Hidden Physics Behind Transport Dynamics
😮oral - Soft-IntroVAE: Analyzing and Improving the Introspective Variational Autoencoder
😮oral⭐code🏠project - Deep Gradient Projection Networks for Pan-sharpening
⭐code - Consensus Maximisation Using Influences of Monotone Boolean Functions
😮oral
- Forecasting Irreversible Disease via Progression Learning
- Causal Hidden Markov Model for Time Series Disease Forecasting
- Towards Unified Surgical Skill Assessment
- RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words
RSTNet: 基于可区分视觉词和非视觉词的自适应注意力机制的图像描述生成模型
解读:14 - Removing the Background by Adding the Background: Towards a Background Robust Self-supervised Video Representation Learning
通过添加背景来去除背景影响:背景鲁棒的自监督视频表征学习
解读:11 - Representative Batch Normalization with Feature Calibration
😮oral
作者主页
基于特征校准的表征批规范化方法解读:4 - Learning Compositional Representation for 4D Captures with Neural ODE
- Involution: Inverting the Inherence of Convolution for Visual Recognition
⭐code
解读:CVPR'21 | involution:超越convolution和self-attention的神经网络新算子 - Spatially Consistent Representation Learning
- Limitations of Post-Hoc Feature Alignment for Robustness
- AutoDO: Robust AutoAugment for Biased Data with Label Noise via Scalable Probabilistic Implicit Differentiation
⭐code - Augmentation Strategies for Learning with Noisy Labels
⭐code - CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching
⭐code - Augmentation Strategies for Learning with Noisy Labels
⭐code - PGT: A Progressive Method for Training Models on Long Videos
😮oral⭐code - Generic Perceptual Loss for Modeling Structured Output Dependencies
- Masksembles for Uncertainty Estimation
⭐code🏠project - Student-Teacher Learning from Clean Inputs to Noisy Inputs
- Scene-Intuitive Agent for Remote Embodied Visual Grounding
- Meta-Mining Discriminative Samples for Kinship Verification
- Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
⭐code📺video
论文公开 - Diverse Branch Block: Building a Convolution as an Inception-like Unit
⭐code - OTCE: A Transferability Metric for Cross-Domain Cross-Task Representations
- Disentangled Cycle Consistency for Highly-realistic Virtual Try-On
⭐code - Stylized Neural Painting
⭐code🏠project📺video
风格化的神经绘画,Stylized Neural Painting,提出 image-to-painting 翻译方法,生成生动逼真、风格可控的绘画艺术作品 - Confluent Vessel Trees with Accurate Bifurcations
⭐code - Repopulating Street Scenes
- Extreme Rotation Estimation using Dense Correlation Volumes
- Can We Characterize Tasks Without Labels or Features?
- Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding
- Online Learning of a Probabilistic and Adaptive Scene Representation
- Generative Modelling of BRDF Textures from Flash Images
⭐code🏠project - PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting
🏠project
作者发明的逆向渲染算法PhySG,可以从一组RGB输入图像中重建物体几何图形、材质和光照,全程端到端运行。 - Self-supervised Video Representation Learning by Context and Motion Decoupling
- Dynamic Region-Aware Convolution
粗解:14 - Meta Pseudo Labels
⭐code📺video - PQA: Perceptual Question Answering
- CondenseNet V2: Sparse Feature Reactivation for Deep Networks
⭐code - CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching
⭐code - Neural Camera Simulators
- Simpler Certified Radius Maximization by Propagating Covariances
😮oral⭐code📺video - Lighting, Reflectance and Geometry Estimation from 360∘ Panoramic Stereo
⭐code - MetricOpt: Learning to Optimize Black-Box Evaluation Metrics
😮oral - Deep Stable Learning for Out-Of-Distribution
分享会 - Learning a Self-Expressive Network for Subspace Clustering
分享会 - Heterogeneous Grid Convolution for Adaptive, Efficient, and Controllable Computation
- Extreme Rotation Estimation using Dense Correlation Volumes
🏠project - Decoupled Dynamic Filter Networks
🏠project📺video - MongeNet: Efficient Sampler for Geometric Deep Learning
⭐code🏠project📺video
-
Visual Perception for Navigation in Human Environments
第二届人类环境导航视觉感知征稿⚠️ 4月15截止 -
UG 2 + Challenge
旨在通过应用图像恢复和增强算法提高分析性能,推动对 "difficult"图像的分析。参与者任务是开发新的算法,以改进对在问题条件下拍摄的图像分析。
👑10K美元奖金- 低能见度环境下的目标检测
- 雾霾条件下的(半)监督目标检测
- (半)低光条件下的人脸检测
- 黑暗视频中的动作识别
- 黑暗中进行完全监督动作识别
- 黑暗中进行半监督动作识别
- 低能见度环境下的目标检测
-
Continual Learning in Computer Vision 征稿中
旨在聚集学术界和工业界的研究人员和工程师,讨论持续学习的最新进展。- Best paper award: 500 USD + 500 USD worth of Huawei cloud credits (HUAWEI)
- Overall Challenge winner: 1,000 USD + 500 USD worth of Huawei cloud credits (HUAWEI)
- Supervised-Learning track winner: 500 USD (HUAWEI)
- Reinforcement-Learning track winner: 500 USD (ServiceNow)
-
Responsible Computer Vision
⚠️ 3月25日截止
本次研讨会将广泛讨论计算机视觉背景下负责任的人工智能的三个主要方面:公平性;可解释性和透明度;以及隐私。 -
Holistic Video Understanding
目的是建立一个整合所有语义概念联合识别的视频基准,因为每个任务的单一类标签往往不足以描述视频的整体内容。 -
FGVC 8
第八届细粒度视觉分类研讨会(FGVC8)将通过细粒度视觉理解的视角,探讨细粒度学习、自监督学习、半监督学习、matching(匹配)、localization(定位)、域适应、迁移学习、小样本学习、机器教学、多模态学习(如音频和视频)、众包和分类学预测等相关话题。⚠️ 论文截稿日期为4月2日
征稿主题包含以下几个方面- Fine-grained categorization细粒度分类
- Novel datasets and data collection strategies for fine-grained categorization用于细粒度分类的新型数据集和数据收集策略
- Appropriate error metrics for fine-grained categorization细粒度分类的适当误差指标
- Low/few shot learning少/小样本学习
- Self-supervised learning自监督学习
- Semi-supervised learning半监督学习
- Transfer-learning from known to novel subcategories
- Attribute and part based approaches
- Taxonomic predictions
- Addressing long-tailed distributions
- Human-in-the-loop
- Fine-grained categorization with humans in the loop
- Embedding human experts’ knowledge into computational models
- Machine teaching
- Interpretable fine-grained models
- Multi-modal learning
- Using audio and video data
- Using geographical priors
- Learning shape
- Fine-grained applications
- Product recognition
- Animal biometrics and camera traps
- Museum collections
- Agricultural
- Medical
- Fashion
- 相关挑战赛如下(部分已在Kaggle网站开始)
- GeoLifeCLEF2021
利用观测结果与航空图像和环境特征配对,预测物种的存在 - Semi-iNat2021
由iNaturalist的数据组成的半监督细粒度图像分类 - iNatChallenge2021
对1万类动植物进行图像分类挑战赛 - iMet2021
对艺术品进行细粒度属性分类 - iMat-Fashion2021未开始
服装实例分割和细粒度属性分类 - Hotel-ID 2021
从图像中识别酒店房间 - HerbariumChallenge2021
从数据集中识别标本,该数据集包含来自美洲、大洋洲和太平洋地区的近66,000种 vascular plant species(维管束植物)的 2.5M 图像 - iWildCam2021
对图像序列中每个物种的动物数量计数 - PlantPathologyChallenge2021未开始
对病害植物的图像进行分类
- GeoLifeCLEF2021
- Fine-grained categorization细粒度分类