👩🏼💻 RAG Zero to Hero Guide
This repository serves as a comprehensive guide to learn RAG from basics to advanced.
Topic
Description
Link
What is RAG?
Explain RAG in with a simple example.
Link
Why RAG?
Explain the drawbacks of LLMs and how RAG addresses them.
Link
How does RAG work?
Explain the different steps in RAG - Indexing, Retrieval, Augmentation and Generation.
Link
RAG Benefits and Challenges
Discusses the benefits and challenges of RAG.
Link
RAG Must Know Terms
Definitions of RAG must know terms.
Link
RAG Roadmap
Detailed roadmap to learn RAG from basics to advanced.
Link
RAG Developer's Stack
Covers the various libraries used to build RAG systems
Link
RAG from Scratch
RAG implementation from scratch without any frameworks.
Link
RAG with LangChain
RAG implementation using LangChain framework.
Link
Website RAG
RAG over a website implemented using LangChain framework.
Link
YouTube Video RAG
RAG over a YouTube video transcript implemented using LangChain framework.
Link
Agentic RAG
Agentic RAG system implemented using CrewAI framework.
Link
🔴Frameworks🔴
Library
Description
Link
LangChain
LangChain is a framework for developing applications powered by large language models (LLMs).
Link
Llama Index
LlamaIndex is a data framework for your LLM applications
Link
Haystack
Haystack is an end-to-end LLM framework that allows you to build applications powered by LLMs, Transformer models, vector search and more.
Link
fastRAG
Research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval.
Link
Llmware
Unified framework for building enterprise RAG pipelines with small, specialized models
Link
🟠Research🟠
Library
Description
Link
FlashRAG
A Python Toolkit for Efficient RAG Research. This toolkit includes 36 pre-processed benchmark RAG datasets and 16 state-of-the-art RAG algorithms.
Link
🟡Data Extraction - Web Scraping🟡
Library
Description
Link
Crawl4AI (Web Scraping)
Open-source LLM Friendly Web Crawler & Scrapper
Link
ScrapeGraphAI (Web & Document)
A web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).
Link
Crawlee (Web Scraping)
A web scraping and browser automation library
Link
🟢Data Extraction - Documents🟢
Library
Description
Link
Docling (Document)
Docling parses documents and exports them to the desired format with ease and speed.
Link
Llama Parse (Document)
GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents).
Link
PyMuPDF4LLM (Document)
PyMuPDF4LLM library makes it easier to extract PDF content in the format you need for LLM & RAG environments.
Link
MegaParse (Document)
Parser for every type of documents
Link
ExtractThinker (Document)
Document Intelligence library for LLMs
Link
🔵Vector Database🔵
Library
Description
Link
SQLite-Vec
A vector search SQLite extension that runs anywhere!
Link
FAISS
A library for efficient similarity search and clustering of dense vectors.
Link
PGVector
Open-source vector similarity search for Postgres
Link
Chroma
The AI-native open-source embedding database. The fastest way to build Python or JavaScript LLM apps with memory!
Link
Qdrant
High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI.
Link
Pincone
The vector database for machine learning applications.
Link
Weaviate
Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.
Link
Milvus
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Link
🟣Chunking🟣
Library
Description
Link
Chonkie
RAG chunking library that is lightweight, lightning-fast, and easy to use. The no-nonsense RAG chunking library. This library supports seven different chunking strategies.
Link
🟤Rerankers🟤
Library
Description
Link
Rerankers
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models. Any new reranking models can be added with very little knowledge of the codebase.
Link
🟠Agentic RAG🟠
Library
Description
Link
CrewAI
Framework for orchestrating role-playing, autonomous AI agents.
Link
Agno
Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.
Link
LangGraph
Build resilient language agents as graphs.
Link
AutoGen
An open-source framework for building AI agent systems.
Link
R2R
Agentic Retrieval-Augmented Generation (RAG) with a RESTful API. R2R offers multimodal content ingestion, hybrid search functionality, knowledge graphs, and comprehensive user and document management.
Link
Vectara
Build Agentic RAG applications.
Link
🟢Graph RAG🟢
Library
Description
Link
GraphRAG
A modular graph-based Retrieval-Augmented Generation (RAG) system.
Link
Nano GraphRAG
A simple, easy-to-hack GraphRAG implementation.
Link
FastGraph RAG
Streamlined and promptable Fast GraphRAG framework designed for interpretable, high-precision, agent-driven retrieval workflows.
Link
🔴Evaluation🔴
Library
Description
Link
RAGChecker
A Fine-grained Framework For Diagnosing RAG.
Link
BeyondLLM
Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems
Link
RAGAS
Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications.
Link
Giskard
Open-Source Evaluation & Testing for ML & LLM systems.
Link
DeepEval
The LLM (RAG) Evaluation Framework.
Link
Paper
Category
Link
Retrieval-Augmented Generation for Large Language Models: A Survey
General
Link
Retrieval-Augmented Generation for Natural Language Processing: A Survey
General
Link
A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions
General
Link
Retrieval-Augmented Generation for AI-Generated Content: A Survey
General
Link
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models
General
Link
A Survey on Retrieval-Augmented Text Generation for Large Language Models
General
Link
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely
General
Link
Graph Retrieval-Augmented Generation: A Survey
Graph RAG
Link
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Agentic RAG
Link
Evaluation of Retrieval-Augmented Generation: A Survey
Evaluation
Link
Searching for Best Practices in Retrieval-Augmented Generation
RAG Best Practices
Link