Introduction

This repo contains instructions to deploy a full RAG application on OpenShift and OpenShift AI. It contains Jupyter Notebooks to ingest data into a vector dabase (Milvus) and a streamlit Application to actually interact with your own knowledge and popular LLMs (e.g. llama3-7B,Mistral 7B or granite-7B). It leverages RAG and gives you many configuration options to tune how RAG behaves and how to tune the model parameters. It supports text input. Check out this Git Repo to learn more about it and how to ingest your own knowledge base - supporting PDFs, Docs, PPTX or your Confluence Wiki. Check out the details [here](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/

The following contains a description on how to set it up on Kubernetes / OpenShift. There is also a guide on how to deploy it locally with podman, albeit it would need some customization on the mount paths and on your CDI (NVIDIA configuration) configuration.

Infrastructure SetUp

a) MILVUS Vector DB

1. ODF or AWS S3: Create a bucket

oc new-project <yourname>-chatbot
oc apply -f milvus/bucket-claim.yaml

2. Download helm repo & Update milvus/openshift-values.yaml with your Object Bucket credentials or activate minio (which autogenerates for you but also spins up minio). Refer to this repo for further instructions: LLM-ON-OpenShift

helm template -f openshift-values.yaml vectordb --set cluster.enabled=false --set etcd.replicaCount=1 --set pulsar.enabled=false milvus/milvus > milvus_manifest_standalone.yaml

yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.securityContext) = {}' -i milvus_manifest_standalone.yaml
yq '(select(.kind == "StatefulSet" and .metadata.name == "vectordb-etcd") | .spec.template.spec.containers[0].securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml
yq '(select(.kind == "Deployment" and .metadata.name == "vectordb-minio") | .spec.template.spec.securityContext) = {"capabilities": {"drop": ["ALL"]}, "runAsNonRoot": true, "allowPrivilegeEscalation": false}' -i milvus_manifest_standalone.yaml

3. Apply to OpenShift

oc apply -f milvus/milvus_manifest_standalone.yaml

b) Deploy Ollama

oc apply -f ollama/

c) Load Notebooks in OpenShift AI

1. Make NS available in RHOAI

oc patch namespace <yourname>-chatbot -p '{"metadata":{"labels":{"opendatahub.io/dashboard":"true"}}}' --type=merge

2. Deploy a Workbench (e.g. medium, standard Data Science)

4. Upload documents into the folders: docx, pptx, pdfs

5. Run the ingest notebook - this takes a while

6. Once done, run the Ollama Notebook to test your RAG installation

d) Deploy a Frontend

oc apply -f streamlit/k8s
oc create route edge --service=rag-frontend

e) Alternative: Deploy vLLM via standard Deployment, Warning: GPU required, it loads mistral7B per default ~~requires approx 20GB RAM if not quantized, a good alternative is the model "TheBloke/Mistral-7B-Instruct-v0.2-AWQ"

1. Put a Secret.yaml in the vllm/vllm-native/, which contains your Huggingface token

oc apply -f vllm/vllm-native/