Document_Query_Search (RAG)

Document Query Search also know as Retrieval-Augmented Generation, or RAG, is an innovative approach that enhances the capabilities of pre-trained large language models (LLMs) by integrating them with external data sources. This technique leverages the generative power of LLMs (Large Language Model), and combines it with the precision of specialized data search mechanisms. The result is a system capable of providing nuanced, contextually relevant responses that go beyond the model's initial training data.

Why RAG?

Up-to-Date Information: Traditional LLMs are limited by their training data, which has a cutoff date. RAG allows the model to access the latest information, ensuring responses are current and relevant.
Reduced Hallucinations: LLMs can sometimes generate confident but inaccurate or "hallucinated" responses. RAG mitigates this by retrieving accurate data from authoritative sources.
Contextual Relevance: By fetching specific information related to the query, RAG ensures that the responses are tailored to the user's context, providing a more personalized experience.

Applications of RAG

RAG can be particularly useful in scenarios where up-to-date or specialized knowledge is required. For example, it can power a customer support chatbot that provides precise answers to queries about product specifications, troubleshooting, or warranty information by accessing a company's product database and user manuals.

How It Works

Input: Documents
- The process begins with a set of documents. These documents can be any type of text, such as articles, reports, or web pages.
- The goal is to extract relevant information from these documents to answer questions.
Generate Document Chunks
- The documents are split into smaller chunks. These chunks could be paragraphs, sentences, or other meaningful segments.
- By breaking down the documents, we create more manageable units for further processing.
LLM Embedding
- Each document chunk is transformed into a numerical representation using a language model (such as an LLM - Large Language Model).
- This embedding captures the semantic meaning of the text and encodes it as a vector.
Vector Database
- The embeddings of document chunks are stored in a vector database.
- Each embedding is associated with a unique chunk ID, allowing efficient retrieval.
Input: Question
- A user submits a question. This question also undergoes LLM embedding to create a vector representation.
Retrieve Relevant Document Chunks
- The question's embedding is used to search the vector database.
- The database returns relevant document chunk IDs based on similarity to the question's embedding.
Combine Chunks and Question
- The retrieved document chunks are combined with the original question.
- This combined text serves as a prompt for another LLM Text model.
LLM Text Model (Answer Generation)
- The LLM Text model generates an answer based on the combined prompt.
- It leverages the context from both the question and the relevant document chunks.
Final Answer
- The output of the LLM Text model provides the final answer to the user's question.

How To Run Application

Run in Google Colab Recommend
- Uploade Document_Query_Search.ipynb file in your google colab and Change runtime type to GPU for fast execution. Run all cell in your colab.
Run in your local machin
- Run on CPU
  - in ex.env file chenge DIVICE_TYPE to CPU
- Run on GPU
  - in ex.env file chenge DIVICE_TYPE to GPU

You also chenge LLM model

chenge MODEL_PATH

✨  Your document is placed inside the source_documents folder.
✨  It is also possible to insert multiple documents at time.

Demo

I ask query all boys are allow in team ? so application provide humanize answer using Guidelines.pdf document.

kunjankanani/Document_Query_Search