LLM-and-VLM-Paper-List

A paper list about large language models and multi-modal models.
Note: It only records papers for my personal needs. It is welcome to open an issue if you think I missed some important or exciting work!

Survey
Language Model
Multi-Modal Models
Agent
- LLM-based Agent
- VLM-based Agent
Useful Resource

Survey

HELM: Holistic evaluation of language models. TMLR. paper
HEIM: Holistic Evaluation of Text-to-Image Models. NeurIPS'2023. paper
Eval Survey: A Survey on Evaluation of Large Language Models. Arxiv'2023. paper
Healthcare LM Survey: A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Arxiv'2023. paper, github
Multimodal LLM Survey: A Survey on Multimodal Large Language Model. Arxiv'2023. paper, github
VLM for vision Task Survey: Vision Language Models for Vision Tasks: A Survey. Arxiv'2023. paper, github
Efficient LLM Survey: Efficient Large Language Models: A Survey. Arxiv'2023. paper, github
Prompt Engineering Survey: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Arxiv'2021. paper
Multimodal Safety Survey: Safety of Multimodal Large Language Models on Images and Text. Arxiv'2024. paper
Multimodal LLM Recent Survey: MM-LLMs: Recent Advances in MultiModal Large Language Models. Arxiv'2024. paper
Prompt Engineering in LLM Survey: A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Arxiv'2024. paper
LLM Security and Privacy Survey: A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. Arxiv'2024. paper
LLM Privacy Survey: Privacy in Large Language Models: Attacks, Defenses and Future Directions. Arxiv'2023. paper

Language Model

Foundation LM Models

Transformer: Attention Is All You Need. NIPS'2017. paper
GPT-1: Improving Language Understanding by Generative Pre-Training. 2018. paper
BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL'2019. paper
GPT-2: Language Models are Unsupervised Multitask Learners. 2018. paper
RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach. Arxiv'2019, paper
DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Arxiv'2019. paper
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR'2020. paper
GPT-3: Language Models are Few-Shot Learners. NeurIPS'2020. paper
GLaM: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. ICML'2022. paper
PaLM: PaLM: Scaling Language Modeling with Pathways. ArXiv'2022. paper
BLOOM: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Arxiv'2022. paper
BLOOMZ: Crosslingual Generalization through Multitask Finetuning. Arxiv'2023. paper
LLaMA: LLaMA: Open and Efficient Foundation Language Models. Arxiv'2023. paper
GPT-4: GPT-4 Technical Report. Arxiv'2023. paper
PaLM 2: PaLM 2 Technical Report. 2023. paper
LLaMA 2: Llama 2: Open foundation and fine-tuned chat models. Arxiv'2023. paper
Mistral: Mistral 7B. Arxiv'2023. paper
Phi1: Project Link
Phi1.5: Project Link
Phi2: Project Link
Falcon: Project Link

RLHF

PPO: Proximal Policy Optimization Algorithms. Arxiv'2017. paper
DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS'2023. paper

Parameter Efficient Fine-tuning

LoRA: LoRA: Low-Rank Adaptation of Large Language Models. Arxiv'2021. paper
Q-LoRA: QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS'2023. paper

Healthcare LM

Med-PaLM: Large Language Models Encode Clinical Knowledge. Arxiv'2022. paper
MedAlpaca: MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data. Arxiv'2023. paper
Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models. Arxiv'2023. paper
HuatuoGPT: HuatuoGPT, towards Taming Language Model to Be a Doctor. EMNLP'2023(findings). paper
GPT-4-Med: Capabilities of GPT-4 on Medical Challenge Problems. Arxiv'2023. paper

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL'2021. paper
Making Pre-trained Language Models Better Few-shot Learners. ACL'2021. paper

Soft Prompt

Prompt-Tuning:The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP'2021 [paper]
Prefix-Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL'2021. paper
P-tuning: P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL'2022. paper
P-tuning v2: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Arxiv'2022. Paper

Between Soft and Hard

Auto-Prompt: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP'2020. paper
FluentPrompt: Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?. EMNLP'2023 (findings). paper
PEZ: Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Arxiv'2023. paper

Multi-modal Models

Foundation Multi-Modal Models

CLIP: Learning Transferable Visual Models From Natural Language Supervision. ICML'2021. paper
DeCLIP: Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. ICLR'2022. paper
FILIP: FILIP: Fine-grained Interactive Language-Image Pre-Training. ICLR'2022. paper
Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CVPR'2022. paper
BLIP: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML'2022. paper
BLIP2: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML'2023. paper
LLaMA-Adapter: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. Arxiv'2023. paper
LLaVA: Visual Instruction Tuning. NeurIPS'2023. paper
Instruct BLIP: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS'2023. paper

Multi-modal Safety

SLD: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. CVPR'2023. paper
ESD: Erasing Concepts from Diffusion Models. ICCV'2023. paper

VLM Hullucinatins

POPE: Evaluating Object Hallucination in Large Vision-Language Models. EMNLP'2023. paper
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models. CVPR'2024. paper

VLM Privacy

Prompt Engineering in VLM

Agent

LLM-based Agent

Stanford Town: Generative Agents: Interactive Simulacra of Human Behavior. UIST'2023. paper

VLM-based Agent

OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Arxiv'2024. paper

Useful-Resource

Hugging Face course. https://huggingface.co/learn
LLaMA Factory. https://github.com/hiyouga/LLaMA-Factory
DeepSpeed. https://github.com/microsoft/DeepSpeed
trlx. https://github.com/CarperAI/trlx
Prompt Engineering Update. https://github.com/thunlp/PromptPapers

wzongyu/LLM-and-Multimodal-Paper-List