Retrieval-Augmented Generation for AI-Generated Content: A Survey

This repo is constructed for collecting and categorizing papers about RAG according to our survey paper: Retrieval-Augmented Generation for AI-Generated Content: A Survey. Considering the rapid growth of this field, we will continue to update both paper and this repo.

Overview

Catalogue

Methods Taxonomy

RAG Foundations

Query-based RAG

REALM: Retrieval-Augmented Language Model Pre-Training

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

REPLUG: Retrieval-Augmented Black-Box Language Models

In-Context Retrieval-Augmented Language Models

When Language Model Meets Private Library

DocPrompting: Generating Code by Retrieving the Docs

Retrieval-based prompt selection for code-related few-shot learning

Inferfix: End-to-end program repair with llms

Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models

Reacc: A retrieval-augmented code completion framework

Uni-parser: Unified semantic parser for question answering on knowledge base and database

RNG-KBQA: generation augmented iterative ranking for knowledge base question answering

End-to-end casebased reasoning for commonsense knowledge base completion

Combining transfer learning with in-context learning using blackbox llms for zero-shot knowledge base question answering

Genegpt: Augmenting large language models with domain tools for improved access to biomedical information

Retrieval-augmented large language models for adolescent idiopathic scoliosis patients in shared decision-making

Retrievegan:Image synthesis via differentiable patch retrieval

Instance-conditioned gan

Retrieval-Augmented Score Distillation for Text-to-3D Generation
Latent Representation-based RAG

Leveraging passage retrieval with generative models for open domain question answering

Bashexplainer: Retrieval-augmented bash code comment generation based on finetuned codebert

EditSum: A Retrieve-and-Edit Framework for Source Code Summarization

Retrieve and Refine: Exemplar-based Neural Comment Generation

RACE: retrieval-augmented commit message generation

Unik-qa: Unified representations of structured and unstructured knowledge for open-domain question answering

A Retrieve-and-Edit Framework for Predicting Structured Outputs

DecAF: Joint Decoding of Answers and Logical Forms for Question Answering over Knowledge Bases

Bridging the kb-text gap: Leveraging structured knowledge-aware pre-training for KBQA

Knowledge-driven cot: Exploring faithful reasoning in llms for knowledge-intensive question answering

Retrieval-enhanced generative model for large-scale knowledge graph completion

Case-based reasoning for natural language queries over knowledge bases

A Protein-Ligand Interaction-focused 3D Molecular Generative Framework for Generalizable Structure-based Drug Design

Improving language models by retrieving from trillions of tokens

Remodiffuse: Retrieval-augmented motion diffusion model

Memorizing transformers

Audio captioning using pre-trained large-scale language model guided by audio-based similar caption retrieval

Retrieval augmented convolutional encoder-decoder networks for video captioning

Retrieval-augmented egocentric video captioning

Re-imagen: Retrievalaugmented text-to-image generator

Knn-diffusion: Image generation via large-scale retrieval

Retrieval-augmented diffusion models

Text-guided synthesis of artistic images with retrieval-augmented diffusion models

Memory-driven text-to-image generation

Mention memory: incorporating textual knowledge into transformers through entity mention attention

Unlimiformer:Long-range transformers with unlimited length input

Entities as experts: Sparse memory access with entity supervision

Amd: Anatomical motion diffusion with interpretable motion decomposition and fusion

Retrieval-augmented text-to-audio generation

Concept-aware video captioning: Describing videos with effective prior information
Logit-based RAG

Generalization through memorization: Nearest neighbor language models

Syntax-Aware Retrieval Augmented Code Generation

Memory-augmented image captioning

Retrieval-based neural source code summarization

Efficient nearest neighbor language models

Nonparametric masked language modeling

Editsum:A retrieve-and-edit framework for source code summarization
Speculative RAG

REST: Retrieval-Based Speculative Decoding

GPTCache

COPY IS ALL YOU NEED

RETRIEVAL IS ACCURATE GENERATION

RAG Enhancements

Applications Taxonomy

RAG for Text

RAG for Code

RAG for Audio

RAG for Image

RAG for Video

RAG for 3D

Text-to-3D

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion

Retrieval-Augmented Score Distillation for Text-to-3D Generation

RAG for Knowledge

RAG for Science

Benchmark

Benchmarking Large Language Models in Retrieval-Augmented Generation

CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

ARES: An Automated Evaluation Framework for Retrieval-AugmentedGeneration Systems

RAGAS: Automated Evaluation of Retrieval Augmented Generation

KILT: a Benchmark for Knowledge Intensive Language Tasks

Citing

if you find this work useful, please cite our paper:

@misc{zhao2024retrievalaugmented,
      title={Retrieval-Augmented Generation for AI-Generated Content: A Survey}, 
      author={Penghao Zhao and Hailin Zhang and Qinhan Yu and Zhengren Wang and Yunteng Geng and Fangcheng Fu and Ling Yang and Wentao Zhang and Bin Cui},
      year={2024},
      eprint={2402.19473},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

vincent507cpu/RAG-Survey