A curated list of prompt/adapter learning methods for vision-language models.
Use text-based learnable prompts/adapters.
Use image-based learnable prompts/adapters.
Use text- and image-based learnable prompts/adapters.
- A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [Paper]
- Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey. [Paper]
Base-to-Novel Generalization. (ViT-B/16 CLIP)
Methods | Pub | Base | Novel | HM (main) | Code |
---|---|---|---|---|---|
CLIP | ICML 21 | 69.34 | 74.22 | 71.70 | Link |
CoOp | IJCV 22 | 82.69 | 63.22 | 71.66 | Link |
CoCoOp | CVPR 22 | 80.47 | 71.69 | 75.83 | Link |
ProDA | CVPR 22 | 81.56 | 72.30 | 76.65 | Link |
RPO | ICCV 23 | 81.13 | 75.00 | 77.78 | Link |
MaPLe | CVPR 23 | 82.28 | 75.14 | 78.55 | Link |
MetaPrompt | TIP 24 | 83.65 | 75.48 | 79.09 | --- |
DePT | CVPR 24 | 83.62 | 75.04 | 79.10 | Link |
LASP | CVPR 23 | 83.18 | 76.11 | 79.48 | --- |
TCP | CVPR 24 | 84.13 | 75.36 | 79.51 | Link |
MMA | CVPR 24 | 83.20 | 76.80 | 79.87 | Link |
PromptSRC | ICCV 23 | 84.26 | 76.10 | 79.97 | Link |
HPT | AAAI 24 | 84.32 | 76.86 | 80.23 | Link |
CoPrompt | ICLR 24 | 84.00 | 77.23 | 80.48 | Link |
PromptKD | CVPR 24 | 86.96 | 80.73 | 83.73 | Link |
Table 1. Average results on 11 datasets.
CoOp
Learning to Prompt for Vision-Language Models. IJCV 2022.
[Paper] [Code]CoCoOp
Conditional Prompt Learning for Vision-Language Models. CVPR 2022.
[Paper] [Code]ProDA
Prompt Distribution Learning. CVPR 2022.
[Paper] [Code]VPT
Visual Prompt Tuning. ECCV 2022.
[Paper] [Code]
MaPLe
MaPLe: Multi-modal Prompt Learning. CVPR 2023.
[Paper] [Code]KgCoOp
Visual-Language Prompt Tuningx with Knowledge-guided Context Optimization. CVPR 2023.
[Paper] [Code]LASP
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models CVPR 2023.
[Paper]DAM-VP
Diversity-Aware Meta Visual Prompting CVPR 2023.
[Paper] [Code]TaskRes
Task Residual for Tuning Vision-Language Models CVPR 2023.
[Paper] [Code]RPO
Read-only Prompt Optimization for Vision-Language Few-shot Learning. ICCV 2023.
[Paper] [Code]KAPT
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models. ICCV 2023.
[Paper]ProGrad
Prompt-aligned Gradient for Prompt Tuning. ICCV 2023.
[Paper][Code]PromptSRC
Self-regulating Prompts: Foundational Model Adaptation without Forgetting. ICCV 2023.
[Paper] [Code]DeFo
Learning to Decompose Visual Features with Latent Textual Prompts. ICLR 2023.
[Paper]POMP
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition NeurIPS 2023.
[Paper] [Code]
MetaPrompt
Learning Domain Invariant Prompt for Vision-Language Models. TIP 2024.
[Paper]SA2VP
SA2VP: Spatially Aligned-and-Adapted Visual Prompt. AAAI 2024.
[Paper] [Code]HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models. AAAI 2024.
[Paper] [Code]LaViP
LaViP: Language-Grounded Visual Prompts. AAAI 2024.
[Paper]CoPrompt
Consistency-guided Prompt Learning for Vision-Language Models. ICLR 2024.
[Paper] [Code]ProText
Learning to Prompt with Text Only Supervision for Vision-Language Models. arxiv 24.
[Paper] [Code]PromptKD
Unsupervised Prompt Distillation for Vision Language Models. CVPR 2024.
[Paper] [Code]DePT
DePT: Decoupled Prompt Tuning. CVPR 2024.
[Paper] [Code]ArGue
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models. CVPR 2024.
[Paper]TCP
TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model. CVPR 2024.
[Paper] [Code]MMA
MMA: Multi-Modal Adapter for Vision-Language Models. CVPR 2024.
[Paper] [Code]
Methods | Pub | ImageNet | -A | -V2 | -R | -S | Avg. (main) | Code |
---|---|---|---|---|---|---|---|---|
CoOp | IJCV 22 | 71.51 | 49.71 | 64.20 | 75.21 | 47.99 | 59.28 | Link |
CoCoOp | CVPR 22 | 71.02 | 50.63 | 64.07 | 76.18 | 48.75 | 59.91 | Link |
TPT | NeurIPS 22 | 68.98 | 54.77 | 63.45 | 77.06 | 47.94 | 60.81 | Link |
TPT+CoOp | NeurIPS 22 | 73.61 | 57.95 | 66.83 | 77.27 | 49.29 | 62.84 | Link |
PromptAlign | NeurIPS 23 | --- | 59.37 | 65.29 | 79.33 | 59.37 | 63.55 | Link |
TPS+CoOp | Arxiv 24 | 73.73 | 60.49 | 66.84 | 77.44 | 49.08 | 65.52 | Link |
RLCF | ICLR 24 | 73.23 | 65.45 | 69.77 | 83.35 | 54.74 | 68.33 | Link |
RLCF+CoOp | ICLR 24 | 76.05 | 69.74 | 70.62 | 84.51 | 56.49 | 70.34 | Link |
Table 2. Test-time prompt tuning methods on OOD data.
TPT
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models. NeurIPS 2022.
[Paper] [Code]SwapPrompt
SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models. NeurIPS 2023.
[Paper]PrompAlign
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization. NeurIPS 2023.
[Paper] [Code]TPS
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Arxiv 2024.
[Paper] [Code]RLCF
Test-time Adaptation with CLIP reward for zero-shot generalization in Vision-Language Models. ICLR 2024.
[Paper] [Code]InTTA
Invariant Test-Time Adaptation for Vision-Language Model Generalization. Arxiv 2024.
[Paper] [Code]
CLIP-Adapter
CLIP-Adapter: Better Vision-Language Models with Feature Adapters. Arxiv 2021.
[Paper] [Code]
Efficient-Prompt
Prompting visual-language models for efficient video understanding. ECCV 2022.
[Paper] [Code]InTTA
Expanding Language-Image Pretrained Models for General Video Recognition. ECCV 2022.
[Paper] [Code]RePro
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection. ICLR 2023.
[Paper] [Code]
L2P
Learning to Prompt for Continual Learning. CVPR 2022.
[Paper] [Code]DualPrompt
DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. ECCV 2022.
[Paper] [Code]EvoPrompt
Evolving Parameterized Prompt Memory for Continual Learning. AAAI 2024.
[Paper]CPP
Steering Prototypes with Prompt-tuning for Rehearsal-free Continual Learning. WACV 2024.
[Paper] [Code]CPrompt
Consistent Prompting for Rehearsal-Free Continual Learning. CVPR 2024.
[Paper] [Code]DIKI
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models. ECCV 2024.
[Paper] [Code]
MoE-Adapters4CL
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters. CVPR 2024.
[Paper] [Code]- Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer. CVPR 2024.
[Paper] RAIL
Advancing Cross-domain Discriminability in Continual Learning of Vison-Language Models. Arxiv 2024.
[Paper]SEMA
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning. Arxiv 2024.
[Paper]