LLM RAG Chatbot over custom data with ui

llama3 reference : https://blog.stackademic.com/rag-using-llama3-langchain-and-chromadb-77bba0154df4

Large Langugate Model

Language models have exploded on the internet ever since ChatGPT came out, and rightfully so. They can write essays, code entire programs, and even make memes.

Limitations of Large Language models
i) They have limitations in tasks requiring external knowledge (for example, our knowledge base that is our private data) and factual information. For ex, if i ask Language model how much I spent this month, LLM wont be able to tel as its not trained on that data

ii) Language models become far more valuable if they can generate insights from any data that we provide, rather than just their original training data.

Since retraining those large language models from scratch costs millions of dollars and takes months, we need better ways to give our existing LLMs access to our custom data. RAG addresses the issue.

Why RAG

Retrieval-augmented generation (RAG) integrates external information retrieval into the process of generating responses by Large Language Models (LLMs). It searches a database for information beyond its pre-trained knowledge base, significantly improving the accuracy and relevance of the generated responses..

The image above shows how a basic RAG system works. Before forwarding the question to the LLM, we have a layer that searches our knowledge base for the “relevant knowledge” to answer the user query. Specifically, in this case, the spending data from the last month. Our LLM can now generate a relevant non-hallucinated response about our budget.

As your data grows, you’ll need efficient ways to identify the most relevant information for your LLM’s limited memory. This is where you’ll want a proper way to store and retrieve the specific data you’ll need for your query, without needing the LLM to remember it.

Where is RAG being used?

We can find RAG models being applied in many areas today, especially those who need factual accuracy and knowledge depth, as they are more knowledgeable and able to provide contextual responses

Real-World Applications:

Question answering: This is perhaps the most prominent use case for RAG models. They power advanced question-answering systems that can retrieve relevant information from large knowledge bases and then generate fluent answers.

Language generation: RAG enables more factual and contextualized text generation for contextualized text summarization from multiple sources

Data-to-text generation: By retrieving relevant structured data, RAG models can generate product/business intelligence reports from databases or describing insights from data visualizations and charts

Multimedia understanding: RAG isn’t limited to text - it can retrieve multimodal information like images, video, and audio to enhance understanding. Answering questions about images/videos by retrieving relevant textual context.

Refer the site for more details
https://qdrant.tech/articles/what-is-rag-in-ai/

Components of This Application

Elasticsearch
- Elasticsearch is a powerful open-source search and analytics engine.
- In this application, Elasticsearch is used to store and index data.
- The RAG system queries the data from Elasticsearch to find relevant information needed to answer user queries.

Streamlit

Streamlit is an open-source app framework for Machine Learning and Data Science projects that allows you to build interactive web applications quickly.
In this application, Streamlit is used to develop the user interface (UI), providing a user-friendly way to interact with the RAG system.

Docker

The application is deployed in Docker as a Docker app, so you don't need to install any libraries on your computer.

OpenAI

OpenAI provides the API for accessing powerful language models like GPT-3.5 Turbo.
In this application, the OpenAI API is used to generate responses based on the data retrieved from Elasticsearch, for the query passed from the chatbot.

How to run This Application

**Note:** Download and install the LLaMA 3 model (8B) from the official website:
https://ollama.com/blog/llama3
Install the OLLAMA server by following the instructions on the above website

Running the Application:
Start the OLLAMA server by running the command 'ollama serve' in your terminal
By default server runs in the port (11434)
Run the application by executing the Python script (e.g., app.py)

Access the application through a web browser by navigating to http://localhost:11434

Clone this git repository from command prompt
git clone https://github.com/padmapria/LLM-RAG-Chatbot-over-custom-data-streamlit-App.git
cd LLM-RAG-Chatbot-over-custom-data-with-streamlit-ui
Install Docker Desktop on your computer and start Docker Desktop
Start the application by running the command from the command prompt
docker compose up -d
Check the deployed application from the brower..
http://localhost:8501