/goku

GenAIOps on Kubernetes: A reference architecture for running GenAI at scale on Kubernetes using OSS tooling

Primary LanguageJupyter NotebookMIT LicenseMIT

GOKU: GenAIOps on Kubernetes

[Work in Progress] A reference architecture for performing Generative AI Operations (aka GenAIOps) using Kubernetes, with open source tools

Table of Contents

Installation

For installation, follow the steps provided in the setup doc

Features

Model Ingestion

GOKU uses a customizable Argo Workflows template to download models from Hugging Face and ingest them into MLFlow.

How to run To run the model ingestion with the default image, follow these steps:
  1. Navigate to the Argo Workflows UI (see steps in the setup doc if unsure)
  2. Enter the "goku" namespace and click on "SUBMIT NEW WORKFLOW"
  3. Select "model-ingestion" as the template to be used
  4. Enter the name of the model you want to ingest and click on "SUBMIT"
  5. You should see the model ingestion workflow running
  6. Once the workflow completes successfully, you should be able to see the model files saved as artifacts on mlflow
  7. You should also be able to verify that the model artifacts have been ingested successfully using MinIO console

DREAM: Distributed RAG Experimentation Framework

Distributed RAG Experimentation Framework (DREAM) presents a kubernetes native architecture and sample code to demonstrate how Retrieval Augmented Generation experiments, evaluation and tracking can be conducted in a distributed manner using Ray, LlamaIndex, Ragas, MLFlow and MinIO. Checkout the DREAM README for details

Model Serving

(WIP)

Vector Ingestion

(WIP)

End-to-end RAG Evaluation

(WIP)

Model Monitoring

(WIP)