/LLM-and-Multimodal-Paper-List

A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.

LLM-and-VLM-Paper-List

A paper list about large language models and multi-modal models.
Note: It only records papers for my personal needs. It is welcome to open an issue if you think I missed some important or exciting work!

Table of Contents

Survey

  • HELM: Holistic evaluation of language models. TMLR. paper
  • HEIM: Holistic Evaluation of Text-to-Image Models. NeurIPS'2023. paper
  • Eval Survey: A Survey on Evaluation of Large Language Models. Arxiv'2023. paper
  • Healthcare LM Survey: A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Arxiv'2023. paper, github
  • Multimodal LLM Survey: A Survey on Multimodal Large Language Model. Arxiv'2023. paper, github
  • VLM for vision Task Survey: Vision Language Models for Vision Tasks: A Survey. Arxiv'2023. paper, github
  • Efficient LLM Survey: Efficient Large Language Models: A Survey. Arxiv'2023. paper, github
  • Prompt Engineering Survey: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Arxiv'2021. paper
  • Multimodal Safety Survey: Safety of Multimodal Large Language Models on Images and Text. Arxiv'2024. paper
  • Multimodal LLM Recent Survey: MM-LLMs: Recent Advances in MultiModal Large Language Models. Arxiv'2024. paper
  • Prompt Engineering in LLM Survey: A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Arxiv'2024. paper
  • LLM Security and Privacy Survey: A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. Arxiv'2024. paper
  • LLM Privacy Survey: Privacy in Large Language Models: Attacks, Defenses and Future Directions. Arxiv'2023. paper

Language Model

Foundation LM Models

  • Transformer: Attention Is All You Need. NIPS'2017. paper
  • GPT-1: Improving Language Understanding by Generative Pre-Training. 2018. paper
  • BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL'2019. paper
  • GPT-2: Language Models are Unsupervised Multitask Learners. 2018. paper
  • RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach. Arxiv'2019, paper
  • DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Arxiv'2019. paper
  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR'2020. paper
  • GPT-3: Language Models are Few-Shot Learners. NeurIPS'2020. paper
  • GLaM: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. ICML'2022. paper
  • PaLM: PaLM: Scaling Language Modeling with Pathways. ArXiv'2022. paper
  • BLOOM: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Arxiv'2022. paper
  • BLOOMZ: Crosslingual Generalization through Multitask Finetuning. Arxiv'2023. paper
  • LLaMA: LLaMA: Open and Efficient Foundation Language Models. Arxiv'2023. paper
  • GPT-4: GPT-4 Technical Report. Arxiv'2023. paper
  • PaLM 2: PaLM 2 Technical Report. 2023. paper
  • LLaMA 2: Llama 2: Open foundation and fine-tuned chat models. Arxiv'2023. paper
  • Mistral: Mistral 7B. Arxiv'2023. paper
  • Phi1: Project Link
  • Phi1.5: Project Link
  • Phi2: Project Link
  • Falcon: Project Link

RLHF

  • PPO: Proximal Policy Optimization Algorithms. Arxiv'2017. paper
  • DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS'2023. paper

Parameter Efficient Fine-tuning

  • LoRA: LoRA: Low-Rank Adaptation of Large Language Models. Arxiv'2021. paper
  • Q-LoRA: QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS'2023. paper

Healthcare LM

  • Med-PaLM: Large Language Models Encode Clinical Knowledge. Arxiv'2022. paper
  • MedAlpaca: MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data. Arxiv'2023. paper
  • Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models. Arxiv'2023. paper
  • HuatuoGPT: HuatuoGPT, towards Taming Language Model to Be a Doctor. EMNLP'2023(findings). paper
  • GPT-4-Med: Capabilities of GPT-4 on Medical Challenge Problems. Arxiv'2023. paper

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

  • PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL'2021. paper
  • Making Pre-trained Language Models Better Few-shot Learners. ACL'2021. paper

Soft Prompt

  • Prompt-Tuning:The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP'2021 [paper]
  • Prefix-Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL'2021. paper
  • P-tuning: P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL'2022. paper
  • P-tuning v2: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Arxiv'2022. Paper

Between Soft and Hard

  • Auto-Prompt: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP'2020. paper
  • FluentPrompt: Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?. EMNLP'2023 (findings). paper
  • PEZ: Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Arxiv'2023. paper

Multi-modal Models

Foundation Multi-Modal Models

  • CLIP: Learning Transferable Visual Models From Natural Language Supervision. ICML'2021. paper
  • DeCLIP: Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. ICLR'2022. paper
  • FILIP: FILIP: Fine-grained Interactive Language-Image Pre-Training. ICLR'2022. paper
  • Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CVPR'2022. paper
  • BLIP: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML'2022. paper
  • BLIP2: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML'2023. paper
  • LLaMA-Adapter: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. Arxiv'2023. paper
  • LLaVA: Visual Instruction Tuning. NeurIPS'2023. paper
  • Instruct BLIP: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS'2023. paper

Multi-modal Safety

  • SLD: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. CVPR'2023. paper
  • ESD: Erasing Concepts from Diffusion Models. ICCV'2023. paper

VLM Hullucinatins

  • POPE: Evaluating Object Hallucination in Large Vision-Language Models. EMNLP'2023. paper
  • HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models. CVPR'2024. paper

VLM Privacy

Prompt Engineering in VLM


Agent

LLM-based Agent

  • Stanford Town: Generative Agents: Interactive Simulacra of Human Behavior. UIST'2023. paper

VLM-based Agent

  • OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Arxiv'2024. paper

Useful-Resource