/Awesome-Prompt-Adapter-Learning-for-VLMs

A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.

Apache License 2.0Apache-2.0

Awesome-Prompt-Adapter-Learning-for-VLMs

A curated list of prompt/adapter learning methods for vision-language models (e.g., CLIP).

Table of Contents

💡Tips:

  • If you know that some papers published in top conferences (CVPR, ICCV, ECCV, ICML, NeurlPS, ICLR) or journals (TPAMI, IJCV, TIP) have not been included in this list, please feel free to contact me at any time, either by sending an email (zhengli97[at]qq.com) or submitting an issue.
  • We would appreciate more people joining us in maintaining this list of papers.
  • Note that papers without open-source code are not recommended.

Keywords

Use text-based prompts/adapters.

Use image-based prompts/adapters.

Use text- and image-based prompts/adapters.

Surveys

  • A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [Paper]
  • Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey. [Paper]

General Prompt Learning

Experimental Comparison

Base-to-Novel Generalization. (ViT-B/16 CLIP)

Methods Pub Base Novel HM (main) Code
CLIP ICML 21 69.34 74.22 71.70 Link
CoOp IJCV 22 82.69 63.22 71.66 Link
CoCoOp CVPR 22 80.47 71.69 75.83 Link
ProDA CVPR 22 81.56 72.30 76.65 Link
KgCoOp CVPR 23 80.73 73.60 77.00 Link
RPO ICCV 23 81.13 75.00 77.78 Link
MaPLe CVPR 23 82.28 75.14 78.55 Link
DePT CVPR 24 83.62 75.04 79.10 Link
TCP CVPR 24 84.13 75.36 79.51 Link
MMA CVPR 24 83.20 76.80 79.87 Link
PromptSRC ICCV 23 84.26 76.10 79.97 Link
HPT AAAI 24 84.32 76.86 80.23 Link
CoPrompt ICLR 24 84.00 77.23 80.48 Link
CasPL ECCV 24 86.11 79.54 82.69 Link
PromptKD CVPR 24 86.96 80.73 83.73 Link

Table 1. Average results on 11 datasets. (Only works with open-source code will be listed.)

Paper List

2022

  • CoOp Learning to Prompt for Vision-Language Models. IJCV 2022.
    [Paper] [Code]
  • CoCoOp Conditional Prompt Learning for Vision-Language Models. CVPR 2022.
    [Paper] [Code]
  • ProDA Prompt Distribution Learning. CVPR 2022.
    [Paper] [Code]
  • VPT Visual Prompt Tuning. ECCV 2022.
    [Paper] [Code]
  • VP Exploring Visual Prompts for Adapting Large-Scale Models. Arxiv 2022.
    [Paper] [Code]

2023

  • MaPLe MaPLe: Multi-modal Prompt Learning. CVPR 2023.
    [Paper] [Code]
  • KgCoOp Visual-Language Prompt Tuningx with Knowledge-guided Context Optimization. CVPR 2023.
    [Paper] [Code]
  • LASP LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models. CVPR 2023.
    [Paper]
  • DAM-VP Diversity-Aware Meta Visual Prompting. CVPR 2023.
    [Paper] [Code]
  • TaskRes Task Residual for Tuning Vision-Language Models. CVPR 2023.
    [Paper] [Code]
  • RPO Read-only Prompt Optimization for Vision-Language Few-shot Learning. ICCV 2023.
    [Paper] [Code]
  • KAPT Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models. ICCV 2023.
    [Paper]
  • CuPL What does a platypus look like? Generating customized prompts for zero-shot image classification. ICCV 2023.
    [Paper] [Code]
  • ProGrad Prompt-aligned Gradient for Prompt Tuning. ICCV 2023.
    [Paper][Code]
  • PromptSRC Self-regulating Prompts: Foundational Model Adaptation without Forgetting. ICCV 2023.
    [Paper] [Code]
  • DeFo Learning to Decompose Visual Features with Latent Textual Prompts. ICLR 2023.
    [Paper]
  • PLOT PLOT: Prompt Learning with Optimal Transport for Vision-Language Models. ICLR 2023.
    [Paper] [Code]
  • POMP Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition. NeurIPS 2023.
    [Paper] [Code]

2024

  • MetaPrompt Learning Domain Invariant Prompt for Vision-Language Models. TIP 2024.
    [Paper]
  • SA2VP SA2VP: Spatially Aligned-and-Adapted Visual Prompt. AAAI 2024.
    [Paper] [Code]
  • HPT Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models. AAAI 2024.
    [Paper] [Code]
  • LaViP LaViP: Language-Grounded Visual Prompts. AAAI 2024.
    [Paper]
  • CoPrompt Consistency-guided Prompt Learning for Vision-Language Models. ICLR 2024.
    [Paper] [Code]
  • ProText Learning to Prompt with Text Only Supervision for Vision-Language Models. arxiv 24.
    [Paper] [Code]
  • PromptKD Unsupervised Prompt Distillation for Vision Language Models. CVPR 2024.
    [Paper] [Code]
  • DePT DePT: Decoupled Prompt Tuning. CVPR 2024.
    [Paper] [Code]
  • ArGue ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models. CVPR 2024.
    [Paper]
  • TCP TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model. CVPR 2024.
    [Paper] [Code]
  • MMA MMA: Multi-Modal Adapter for Vision-Language Models. CVPR 2024.
    [Paper] [Code]
  • KDPL Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation. ECCV 2024.
    [Paper] [Code]
  • CoCoLe Conceptual Codebook Learning for Vision-Language Models. ECCV 2024.
    [Paper]
  • CasPL Cascade Prompt Learning for Vision-Language Model Adaptation ECCV 2024.
    [Paper] [Code]

Another form of Prompt

Paper List

  • CPT CPT: Colorful Prompt Tuning for pre-trained vision-language models Arxiv 2021.
    [Paper] [Code]
  • DetPro Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model. CVPR 2022.
    [Paper] [Code]
  • PromptDet PromptDet: Towards Open-vocabulary Detection using Uncurated Images. ECCV 2022.
    [Paper] [Code]
  • Visual Prompting via Image Inpainting. NeurIPS 2022.
    [Paper]
  • OVSeg Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP. CVPR 2023.
    [Paper] [Code]
  • LoGoPrompt LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models. ICCV 2023.
    [Paper]
  • RedCircle What does CLIP know about a red circle? Visual prompt engineering for VLMs. ICCV 2023.
    [Paper]]
  • FGVP Fine-Grained Visual Prompting. NeurIPS 2023.
    [Paper] [Code]
  • SoM Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. Arxiv 2023.
    [Paper] [Code]
  • Alpha-CLIP Alpha-CLIP: A CLIP Model Focusing on Wherever You Want. CVPR 2024.
    [Paper] [Code]
  • ViP-LLaVA Making Large Multimodal Models Understand Arbitrary Visual Prompts. CVPR 2024.
    [Paper] [Code]
  • SSC Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation. ECCV 2024.
    [Paper] [Code]

General Test-time Prompt Learning

Experimental Comparison

Methods Pub ImageNet -A -V2 -R -S Avg. (main) Code
CoOp IJCV 22 71.51 49.71 64.20 75.21 47.99 59.28 Link
CoCoOp CVPR 22 71.02 50.63 64.07 76.18 48.75 59.91 Link
TPT NeurIPS 22 68.98 54.77 63.45 77.06 47.94 60.81 Link
TPT+CoOp NeurIPS 22 73.61 57.95 66.83 77.27 49.29 62.84 Link
PromptAlign NeurIPS 23 --- 59.37 65.29 79.33 59.37 63.55 Link
TPS+CoOp Arxiv 24 73.73 60.49 66.84 77.44 49.08 65.52 Link
RLCF ICLR 24 73.23 65.45 69.77 83.35 54.74 68.33 Link
RLCF+CoOp ICLR 24 76.05 69.74 70.62 84.51 56.49 70.34 Link

Table 2. Test-time prompt tuning methods on OOD data.

Paper List

  • TPT Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models. NeurIPS 2022.
    [Paper] [Code]
  • SwapPrompt SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models. NeurIPS 2023.
    [Paper]
  • PrompAlign Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization. NeurIPS 2023.
    [Paper] [Code]
  • TPS Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Arxiv 2024.
    [Paper] [Code]
  • RLCF Test-time Adaptation with CLIP reward for zero-shot generalization in Vision-Language Models. ICLR 2024.
    [Paper] [Code]
  • InTTA Invariant Test-Time Adaptation for Vision-Language Model Generalization. Arxiv 2024.
    [Paper] [Code]

General Adapter Learning

Paper List

  • CLIP-Adapter CLIP-Adapter: Better Vision-Language Models with Feature Adapters. Arxiv 2021.
    [Paper] [Code]
  • Tip-Adapter Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification. ECCV 2022.
    [Paper] [Code]
  • APE Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement. ICCV 2023.
    [Paper] [Code]
  • CaFoPrompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners. CVPR 2023.
    [Paper] [Code]
  • Meta-Adapter Meta-Adapter: An Online Few-shot Learner for Vision-Language Model. NeurIPS 2023.
    [Paper] [Code]

Video Understanding

Prompt Learning

  • Efficient-Prompt Prompting visual-language models for efficient video understanding. ECCV 2022.
    [Paper] [Code]
  • InTTA Expanding Language-Image Pretrained Models for General Video Recognition. ECCV 2022.
    [Paper] [Code]
  • RePro Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection. ICLR 2023.
    [Paper] [Code]

Continual Learning

Prompt Learning

  • L2P Learning to Prompt for Continual Learning. CVPR 2022.
    [Paper] [Code]
  • DualPrompt DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. ECCV 2022.
    [Paper] [Code]
  • EvoPrompt Evolving Parameterized Prompt Memory for Continual Learning. AAAI 2024.
    [Paper]
  • CPrompt Consistent Prompting for Rehearsal-Free Continual Learning. CVPR 2024.
    [Paper] [Code]
  • DIKI Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models. ECCV 2024.
    [Paper] [Code]

Adapter Learning

  • MoE-Adapters4CL Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters. CVPR 2024.
    [Paper] [Code]
  • SSIAT Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer. CVPR 2024.
    [Paper]

Others

  • LoCoOp LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning. NeurIPS 2023.
    [Paper] [Code]