👩🏼‍💻 RAG Zero to Hero Guide

This repository serves as a comprehensive guide to learn RAG from basics to advanced.

Quick links


🧱 RAG Basics Course	🚀 RAG Toolkit	🩸 RAG Survey Papers

Topic	Description	Link
What is RAG?	Explain RAG in with a simple example.	Link
Why RAG?	Explain the drawbacks of LLMs and how RAG addresses them.	Link
How does RAG work?	Explain the different steps in RAG - Indexing, Retrieval, Augmentation and Generation.	Link
RAG Benefits and Challenges	Discusses the benefits and challenges of RAG.	Link
RAG Must Know Terms	Definitions of RAG must know terms.	Link
RAG Roadmap	Detailed roadmap to learn RAG from basics to advanced.	Link
RAG Developer's Stack	Covers the various libraries used to build RAG systems	Link
RAG from Scratch	RAG implementation from scratch without any frameworks.	Link
RAG with LangChain	RAG implementation using LangChain framework.	Link
Website RAG	RAG over a website implemented using LangChain framework.	Link
YouTube Video RAG	RAG over a YouTube video transcript implemented using LangChain framework.	Link
Agentic RAG	Agentic RAG system implemented using CrewAI framework.	Link

🔴Frameworks🔴

Library	Description	Link
LangChain	LangChain is a framework for developing applications powered by large language models (LLMs).	Link
Llama Index	LlamaIndex is a data framework for your LLM applications	Link
Haystack	Haystack is an end-to-end LLM framework that allows you to build applications powered by LLMs, Transformer models, vector search and more.	Link
fastRAG	Research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval.	Link
Llmware	Unified framework for building enterprise RAG pipelines with small, specialized models	Link

🟠Research🟠

Library	Description	Link
FlashRAG	A Python Toolkit for Efficient RAG Research. This toolkit includes 36 pre-processed benchmark RAG datasets and 16 state-of-the-art RAG algorithms.	Link

🟡Data Extraction - Web Scraping🟡

Library	Description	Link
Crawl4AI (Web Scraping)	Open-source LLM Friendly Web Crawler & Scrapper	Link
ScrapeGraphAI (Web & Document)	A web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).	Link
Crawlee (Web Scraping)	A web scraping and browser automation library	Link

🟢Data Extraction - Documents🟢

Library	Description	Link
Docling (Document)	Docling parses documents and exports them to the desired format with ease and speed.	Link
Llama Parse (Document)	GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents).	Link
PyMuPDF4LLM (Document)	PyMuPDF4LLM library makes it easier to extract PDF content in the format you need for LLM & RAG environments.	Link
MegaParse (Document)	Parser for every type of documents	Link
ExtractThinker (Document)	Document Intelligence library for LLMs	Link

🔵Vector Database🔵

Library	Description	Link
SQLite-Vec	A vector search SQLite extension that runs anywhere!	Link
FAISS	A library for efficient similarity search and clustering of dense vectors.	Link
PGVector	Open-source vector similarity search for Postgres	Link
Chroma	The AI-native open-source embedding database. The fastest way to build Python or JavaScript LLM apps with memory!	Link
Qdrant	High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI.	Link
Pincone	The vector database for machine learning applications.	Link
Weaviate	Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.	Link
Milvus	Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search	Link

🟣Chunking🟣

Library	Description	Link
Chonkie	RAG chunking library that is lightweight, lightning-fast, and easy to use. The no-nonsense RAG chunking library. This library supports seven different chunking strategies.	Link

🟤Rerankers🟤

Library	Description	Link
Rerankers	A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models. Any new reranking models can be added with very little knowledge of the codebase.	Link

🟠Agentic RAG🟠

Library	Description	Link
CrewAI	Framework for orchestrating role-playing, autonomous AI agents.	Link
Agno	Build AI Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.	Link
LangGraph	Build resilient language agents as graphs.	Link
AutoGen	An open-source framework for building AI agent systems.	Link
R2R	Agentic Retrieval-Augmented Generation (RAG) with a RESTful API. R2R offers multimodal content ingestion, hybrid search functionality, knowledge graphs, and comprehensive user and document management.	Link
Vectara	Build Agentic RAG applications.	Link

🟢Graph RAG🟢

Library	Description	Link
GraphRAG	A modular graph-based Retrieval-Augmented Generation (RAG) system.	Link
Nano GraphRAG	A simple, easy-to-hack GraphRAG implementation.	Link
FastGraph RAG	Streamlined and promptable Fast GraphRAG framework designed for interpretable, high-precision, agent-driven retrieval workflows.	Link

🔴Evaluation🔴

Library	Description	Link
RAGChecker	A Fine-grained Framework For Diagnosing RAG.	Link
BeyondLLM	Beyond LLM offers an all-in-one toolkit for experimentation, evaluation, and deployment of Retrieval-Augmented Generation (RAG) systems	Link
RAGAS	Ragas is your ultimate toolkit for evaluating and optimizing Large Language Model (LLM) applications.	Link
Giskard	Open-Source Evaluation & Testing for ML & LLM systems.	Link
DeepEval	The LLM (RAG) Evaluation Framework.	Link

Paper	Category	Link
Retrieval-Augmented Generation for Large Language Models: A Survey	General	Link
Retrieval-Augmented Generation for Natural Language Processing: A Survey	General	Link
A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions	General	Link
Retrieval-Augmented Generation for AI-Generated Content: A Survey	General	Link
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models	General	Link
A Survey on Retrieval-Augmented Text Generation for Large Language Models	General	Link
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely	General	Link
Graph Retrieval-Augmented Generation: A Survey	Graph RAG	Link
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG	Agentic RAG	Link
Evaluation of Retrieval-Augmented Generation: A Survey	Evaluation	Link
Searching for Best Practices in Retrieval-Augmented Generation	RAG Best Practices	Link