欢迎分享CVPR 2024 论文和代码 / Welcome to share the paper and code of CVPR 2024

Question

欢迎分享CVPR 2024 论文和代码 / Welcome to share the paper and code of CVPR 2024

amusi opened this issue 3 months ago · 78 comments

[The format of the issue]
Paper name/title:
Paper link:
Code link:

Answer 1 · 2024-02-27T06:02:23.000Z

Paper name/title: ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks
Paper link: https://arxiv.org/abs/2306.14525
Code link: https://parameternet.github.io/

Answer 2 · 2024-02-27T06:03:21.000Z

Paper name/title: An Empirical Study of Scaling Law for OCR
Paper link: https://arxiv.org/abs/2401.00028
Code link: https://github.com/large-ocr-model/large-ocr-model.github.io

Answer 3 · 2024-02-27T06:35:04.000Z

Paper name/title: PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
Paper link: https://arxiv.org/abs/2312.08371
Code link: https://github.com/kuanchihhuang/PTT

Answer 4 · 2024-02-27T06:42:07.000Z

Paper name/title: GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
Paper link: https://arxiv.org/abs/2312.02155
Code link: https://github.com/ShunyuanZheng/GPS-Gaussian
Project link: https://shunyuanzheng.github.io/GPS-Gaussian

Answer 5 · 2024-02-27T06:52:17.000Z

Paper name/title: GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper link: https://arxiv.org/abs/2312.02134
Code link: https://github.com/huliangxiao/GaussianAvatar

Answer 6 · 2024-02-27T07:24:38.000Z

Paper name/title: Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
Paper link: https://arxiv.org/abs/2312.04265
Code link: https://github.com/w1oves/Rein

Answer 7 · 2024-02-27T11:18:09.000Z

Paper name/title: Vlogger: Make Your Dream A Vlog
Paper link: https://arxiv.org/abs/2401.09414
Code link: https://github.com/Vchitect/Vlogger

Answer 8 · 2024-02-27T11:21:45.000Z

Paper name/title: Seamless Human Motion Composition with Blended Positional Encodings
Paper link: https://arxiv.org/abs/2402.15509
Code link: https://github.com/BarqueroGerman/FlowMDM

Answer 9 · 2024-02-27T11:34:49.000Z

Paper name/title: GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Paper link: https://arxiv.org/abs/2311.14521
Code link: https://github.com/buaacyw/GaussianEditor

Answer 10 · 2024-02-27T13:50:06.000Z

Paper name/title: UniGS: Unified Representation for Image Generation and Segmentation
Paper link: https://arxiv.org/abs/2312.01985

classification could be: Diffusion / Image Generation / Segmentation

Answer 11 · 2024-02-27T15:33:56.000Z

Paper name/title: LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Paper link: https://arxiv.org/abs/2311.18651
Code link: https://github.com/Open3DA/LL3DA
Project link: https://ll3da.github.io/

Answer 12 · 2024-02-27T16:26:10.000Z

Paper name/title: CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
Paper link: https://arxiv.org/pdf/2312.10908.pdf
Project link: https://clova-tool.github.io/

Answer 13 · 2024-02-27T18:29:07.000Z

Paper name/title: Edit One for All: Interactive Batch Image Editing
Paper link: https://arxiv.org/abs/2401.10219
Code link: https://github.com/thaoshibe/edit-one-for-all
Project page: https://thaoshibe.github.io/edit-one-for-all

Answer 14 · 2024-02-28T01:18:17.000Z

Paper name/title: UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Paper link: https://arxiv.org/abs/2310.08370
Code link: https://github.com/Nightmare-n/UniPAD

Answer 15 · 2024-02-28T02:41:53.000Z

Paper name/title: Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
Paper link: https://arxiv.org/abs/2402.17228
Code link: https://github.com/DearCaat/RRT-MIL

Answer 16 · 2024-02-28T04:28:32.000Z

Paper name/title: VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Paper link: https://arxiv.org/abs/2402.17300
Code link: https://github.com/Luffy03/VoCo

Answer 17 · 2024-02-28T06:26:00.000Z

Paper name/title: SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Paper link: https://arxiv.org/abs/2311.15537
Code link: https://github.com/xb534/SED

Answer 18 · 2024-02-28T07:25:55.000Z

Paper name/title: Link-Context Learning for Multimodal LLMs
Paper link: https://arxiv.org/pdf/2308.07891.pdf
Code link: https://github.com/isekai-portal/Link-Context-Learning/tree/main

Answer 19 · 2024-02-28T07:49:54.000Z

Paper name/title: MoMask: Generative Masked Modeling of 3D Human Motions
Paper link: https://arxiv.org/abs/2312.00063
Code link: https://github.com/EricGuo5513/momask-codes

Answer 20 · 2024-02-28T09:13:28.000Z

Paper name/title: MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Paper link: https://arxiv.org/abs/2311.17005
Code link: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2

Answer 21 · 2024-02-28T09:51:06.000Z

Paper name/title: ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Paper link: https://arxiv.org/abs/2311.15264
Code link: https://github.com/nicoboou/chada_vit

Answer 22 · 2024-02-28T09:56:16.000Z

Paper name/title: Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Paper link: https://arxiv.org/abs/2309.13101
Code link: https://github.com/ingra14m/Deformable-3D-Gaussians
Project page: https://ingra14m.github.io/Deformable-Gaussians/

Answer 23 · 2024-02-28T09:57:17.000Z

Paper name/title: SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Paper link: https://arxiv.org/abs/2312.14937
Code link: https://github.com/yihua7/SC-GS
Project page: https://yihua7.github.io/SC-GS-web/

Answer 24 · 2024-02-28T11:13:47.000Z

Paper name/title: LEMON: Learning 3D Human-Object Interaction Relation from 2D Images (Embodied AI)
Paper link: https://arxiv.org/abs/2312.08963
Code link: https://github.com/yyvhang/lemon_3d

Answer 25 · 2024-02-28T11:26:00.000Z

Paper name/title: DeepCache: Accelerating Diffusion Models for Free
Paper link: https://arxiv.org/abs/2312.00858
Code link: https://github.com/horseee/DeepCache

Answer 26 · 2024-02-29T17:35:44.000Z

Paper name/title: Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Paper link: https://arxiv.org/abs/2312.03818
Code link: https://github.com/SunzeY/AlphaCLIP

Answer 27 · 2024-03-01T04:55:51.000Z

Paper name/title: VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper link: https://arxiv.org/abs/2311.17982
Code link: https://github.com/Vchitect/VBench
Project Page: https://vchitect.github.io/VBench-project/

Answer 28 · 2024-03-01T05:52:08.000Z

Paper name/title: OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Paper link: https://arxiv.org/abs/2311.17911
Code link: https://github.com/shikiw/OPERA

Answer 29 · 2024-03-01T06:23:49.000Z

Paper name/title: RepViT: Revisiting Mobile CNN From ViT Perspective
Paper link: https://arxiv.org/abs/2307.09283
Code link: https://github.com/THU-MIG/RepViT

Answer 30 · 2024-03-02T05:46:55.000Z

Paper name/title: SeD: Semantic-Aware Discriminator for Image Super-Resolution
Paper link: https://arxiv.org/abs/2402.19387
Code link: https://github.com/lbc12345/SeD

Answer 31 · 2024-03-02T12:31:26.000Z

Paper name/title: Efficient Dataset Distillation via Minimax Diffusion
Paper link: https://arxiv.org/abs/2311.15529
Code link: https://github.com/vimar-gu/MinimaxDiffusion

Answer 32 · 2024-03-02T20:01:17.000Z

Paper name/title: Improved Visual Grounding through Self-Consistent Explanations
Paper link: https://arxiv.org/abs/2312.04554
Code link: https://github.com/uvavision/SelfEQ

Answer 33 · 2024-03-03T23:25:16.000Z

Paper name/title: Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Paper link: https://arxiv.org/abs/2312.16812
Code link: https://github.com/oppo-us-research/SpacetimeGaussians
Project Page: https://oppo-us-research.github.io/SpacetimeGaussians-website/

Answer 34 · 2024-03-04T06:23:10.000Z

Paper name/title: MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
Paper link: https://arxiv.org/abs/2312.12468
Project Page: https://maskint.github.io

Answer 35 · 2024-03-04T08:25:09.000Z

Paper name/title: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Paper link: https://arxiv.org/abs/2312.00784
Project Page: https://vip-llava.github.io/

Answer 36 · 2024-03-05T06:44:22.000Z

Paper name/title: KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos
Paper link: https://arxiv.org/abs/2402.07220
Code link: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024
Project Page: https://lixinustc.github.io/projects/KVQ/

Answer 37 · 2024-03-06T06:30:10.000Z

Paper name/title:ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Paper link:https://arxiv.org/abs/2403.00303
Code link:https://github.com/PriNing/ODM

Answer 38 · 2024-03-06T08:58:30.000Z

Paper name/title: Pink: Unveiling the power of referential comprehension for multi-modal llms
Paper link: https://arxiv.org/abs/2310.00582
Code link: https://github.com/SY-Xuan/Pink

Answer 39 · 2024-03-11T08:18:47.000Z

Paper name/title: MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Paper link: https://arxiv.org/pdf/2402.05408.pdf
Code link: https://github.com/limuloo/migc

Answer 40 · 2024-03-12T07:45:52.000Z

Paper name/title: Memory-based Adapters for Online 3D Scene Perception
Paper link: https://arxiv.org/abs/2403.06974
Code link: https://github.com/xuxw98/Online3D

Answer 41 · 2024-03-12T10:35:40.000Z

Paper name/title: Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Paper link: https://arxiv.org/abs/2311.08046
Code link: https://github.com/PKU-YuanGroup/Chat-UniVi

Answer 42 · 2024-03-13T01:33:20.000Z

Paper name/title: CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Paper link: https://arxiv.org/abs/2309.00610
Code link: https://github.com/hzxie/city-dreamer
Project Page: https://haozhexie.com/project/city-dreamer/

Answer 43 · 2024-03-13T03:37:15.000Z

Paper name/title: OneLLM: One Framework to Align All Modalities with Language
Paper link: https://arxiv.org/abs/2312.03700
Code link: https://github.com/csuhan/OneLLM

Answer 44 · 2024-03-13T06:09:45.000Z

Paper name/title: DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Paper link: https://arxiv.org/abs/2403.06951
Code link: https://github.com/Tianhao-Qi/DEADiff_code
Project page: https://tianhao-qi.github.io/DEADiff/

Answer 45 · 2024-03-13T07:45:03.000Z

Paper name/title: SVGDreamer: Text Guided SVG Generation with Diffusion Model
Paper link: https://arxiv.org/abs/2312.16476
Code link: https://ximinng.github.io/SVGDreamer-project/

Answer 46 · 2024-03-14T08:50:52.000Z

Paper name/title: PromptKD: Unsupervised Prompt Distillation for Vision-Language Models.
Paper link: https://arxiv.org/abs/2403.02781
Code link: https://github.com/zhengli97/PromptKD

Answer 47 · 2024-03-14T21:44:06.000Z

Paper name/title: PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF
Paper link: https://arxiv.org/abs/2311.13099
Code link: https://github.com/FYTalon/pienerf/

Answer 48 · 2024-03-15T08:30:40.000Z

Paper name/title: InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model
Paper link: https://arxiv.org/abs/2312.05849
Code link: https://github.com/jiuntian/interactdiffusion

Answer 49 · 2024-03-18T13:53:45.000Z

Paper name/title: Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Paper link: https://arxiv.org/abs/2403.10254
Code link: https://github.com/924973292/EDITOR

Answer 50 · 2024-03-19T06:03:47.000Z

Paper name/title: LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Paper link: https://arxiv.org/abs/2311.11284
Code link: https://github.com/EnVision-Research/LucidDreamer

Answer 51 · 2024-03-19T14:55:55.000Z

Paper name/title: Neural Markov Random Field for Stereo Matching
Paper link: https://arxiv.org/abs/2403.11193
Code link: https://github.com/aeolusguan/NMRF

Answer 52 · 2024-03-21T02:26:10.000Z

Paper name/title: APISR: Anime Production Inspired Real-World Anime Super-Resolution
Paper link: https://arxiv.org/abs/2403.01598
Code link: https://github.com/Kiteretsu77/APISR

Answer 53 · 2024-03-21T05:46:18.000Z

Paper name/title: VTimeLLM: Empower LLM to Grasp Video Moments
Paper link: https://arxiv.org/abs/2311.18445
Code link: https://github.com/huangb23/VTimeLLM

Answer 54 · 2024-03-25T02:58:03.000Z

Paper name/title: MMA-Diffusion: MultiModal Attack on Diffusion Models
Paper link: https://arxiv.org/abs/2311.17516
Code link: https://github.com/yangyijune/MMA-Diffusion

Answer 55 · 2024-03-25T17:19:03.000Z

Paper name/title: VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Paper link: https://arxiv.org/abs/2312.00845
Code link: https://github.com/HyeonHo99/Video-Motion-Customization
Project Page: https://video-motion-customization.github.io/

Answer 56 · 2024-03-26T01:58:16.000Z

Paper name/title: Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
Paper link: https://arxiv.org/abs/2403.16131
Code link: https://github.com/xiuqhou/Salience-DETR

Answer 57 · 2024-03-28T07:37:05.000Z

Paper name/title: HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
Paper link: https://arxiv.org/abs/2403.12033
Code link: https://github.com/zhangce01/HiKER-SGG
Project page: https://zhangce01.github.io/HiKER-SGG/

Answer 58 · 2024-03-30T01:48:11.000Z

Paper name/title: Learning from Synthetic Human Group Activities
Paper link: https://arxiv.org/abs/2306.16772
Code link: https://github.com/cjerry1243/M3Act
Project page: https://cjerry1243.github.io/M3Act/

Answer 59 · 2024-04-05T08:32:38.000Z

Paper name/title: Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Paper link: https://arxiv.org/abs/2403.04700
Code link: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT

Answer 60 · 2024-04-07T14:01:36.000Z

Paper name/title: Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Paper link: https://arxiv.org/pdf/2311.12028.pdf
Code link: https://github.com/NationalGAILab/HoT

Answer 61 · 2024-04-08T05:37:13.000Z

Paper name/title: FairCLIP: Harnessing Fairness in Vision-Language Learning
Paper link: https://arxiv.org/abs/2403.19949
Code link: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
Project Page: https://ophai.hms.harvard.edu/datasets/harvard-fairvlmed10k/

Answer 62 · 2024-04-08T08:09:23.000Z

Paper name/title: Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Paper link: https://arxiv.org/pdf/2308.09911.pdf
Code link: https://github.com/QinYang79/RDE

Answer 63 · 2024-04-12T09:46:26.000Z

Paper name/title: A Cross-Subject Brain Decoding Framework
Project Page: https://littlepure2333.github.io/MindBridge/
Paper link: https://arxiv.org/abs/2404.07850
Code link: https://github.com/littlepure2333/MindBridge

Answer 64 · 2024-04-16T18:51:55.000Z

Paper name/title: A General and Efficient Training for Transformer via Token Expansion
Paper link: https://arxiv.org/abs/2404.00672
Code link: https://github.com/Osilly/TokenExpansion

Answer 65 · 2024-04-17T11:03:54.000Z

Paper name/title: Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Paper link: https://arxiv.org/abs/2403.17749
Code link: https://github.com/YuqiYang213/MLoRE

Answer 66 · 2024-04-18T13:26:10.000Z

Paper name/title: Traffic Scene Parsing through the TSP6K Dataset
Paper link: https://arxiv.org/pdf/2303.02835.pdf
Code link: https://github.com/PengtaoJiang/TSP6K