[Work in Progress] A reference architecture for performing Generative AI Operations (aka GenAIOps) using Kubernetes, with open source tools
For installation, follow the steps provided in the setup doc
GOKU uses a customizable Argo Workflows template to download models from Hugging Face and ingest them into MLFlow.
How to run
To run the model ingestion with the default image, follow these steps:- Navigate to the Argo Workflows UI (see steps in the setup doc if unsure)
- Enter the "goku" namespace and click on "SUBMIT NEW WORKFLOW"
- Select "model-ingestion" as the template to be used
- Enter the name of the model you want to ingest and click on "SUBMIT"
- You should see the model ingestion workflow running
- Once the workflow completes successfully, you should be able to see the model files saved as artifacts on mlflow
- You should also be able to verify that the model artifacts have been ingested successfully using MinIO console
Distributed RAG Experimentation Framework (DREAM) presents a kubernetes native architecture and sample code to demonstrate how Retrieval Augmented Generation experiments, evaluation and tracking can be conducted in a distributed manner using Ray, LlamaIndex, Ragas, MLFlow and MinIO. Checkout the DREAM README for details