/LlamaIndex-Agent

A RAG system is just the beginning of harnessing the power of LLM. The next step is creating an intelligent Agent. In Agentic RAG the Agent makes use of available tools, strategies and LLM to generate response in a specialized way. Unlike a simple RAG, an Agent can dynamically choose between tools, routing strategy, etc.

Primary LanguageJupyter Notebook

AGENTIC-RAG

An Llama Index based Agentic-RAG system to perform PDF Question-Answering. The Agent can choose from summarization query engine or vector query engine to generate response. The LLM used is phi3 3.8B.

alt text

Frameworks

  • Agentic-RAG: Llama Index
  • App: Gradio
  • LLM: Phi3 3.8B
  • Embedding: nomic-embed-text
  • Local LLM: Ollama

File Structure

  • llamaindex_basic.ipynb: A simple introduction to Llama Index Agentic RAG concepts and terminologies.
  • agentic_rag_intro.ipynb: This notebook contains codes and step by step explanation of how to build an Agentic-RAG with Llama Index.
  • agentic_rag_customization.ipynb Customizing the Agentic-RAG system to perform pdf Q/A with Phi3
  • utils.py Contains all the functions in one place.
  • app.py Creating Gradio application.

Introduction

RAG is a wonderful solution to make LLM even smarter with Memeory. However RAG is a single end2end pipeline. User will have various kind of queries which will require diffrent kind of processing with a specialized pipeline. This is where AGENTIC-RAG comes into action. A smart AGENT takes dicesion based on user queries and avaialble pipelines to fireup one or more of the pipelines to answer user queries.

Docker

For Docker Implementation of the Application Checkout the GitHub Repo. 🚛

Description

In this work we build a Agentic RAG with llamaindex. Retrieval Augmented Generation (RAG) is one of the most widespread usecase of LLM. In RAG there exist a single pipeline for the workflow. Hence all user queries are processed in exactly the same way. However there exist different types of user queries which may require different pipenine for processing. In this work we build two piplines to answer user queries with specific need. The pipelines are

  • Summarization pipeline
  • Question-Answering pipeline

Decription of files in sequence they were developed

The code description are provided within the files.

  • llamaindex_basic.ipynb: a brief intro on llamaindex framework
  • agentic_rag_intro.ipynb: a brief introduction to agentic rag development.
  • agentic_rag_customization.ipynb: the notebook for complete code on developing the agentic rag to answer user queries from a pdf file.
  • app.py: finally build a Application with Gradio. This is build on top of agentic_rag_customization.ipynb so all the necessary functions are present in utils.py.

How to RUN

  • All the work is developed in LINUX env so we need a LINUX system with atleast 8GB RAM.
  • Create a Virtual Env
  • Install libraries with make install
  • Download Ollama and start Ollama server with make ollama_download on a new CLI as this will block the CLI.
  • Pull models required for tasks with make models
  • To Start Graio App run python app.py

Acknowledgements

  • Thanks to DeepLearning.AI and LlamaIndex for the wonderful course
  • Thanks to Microsoft for open source Phi3

If you find the repo helpful, please drop a ⭐