/demo-ai-gitops-catalog

A catalog for all things GitOps for AI on OpenShift

Primary LanguageHTMLMIT LicenseMIT

OpenShift AI / ML GitOps Catalog

Spelling Linting

This project is a catalog of configurations used to provision infrastructure, on OpenShift, that supports machine learning (ML) and artificial intelligence (AI) workloads.

The intention of this repository is to help support practical use of OpenShift for AI / ML workloads and provide a catalog of configurations / demos / workshops.

Please look at the GitOps Catalog if you only need to automate an operator install.

In this repo, look at various kustomized configs and argo apps for ideas.

For issues with oc apply -k see the known issues section below.

Prerequisites - Get a cluster

  • OpenShift 4.14+
    • role: cluster-admin - for all demo or cluster configs
    • role: self-provisioner - for namespaced components

Red Hat Demo Platform Options (Tested)

NOTE: The node sizes below are the recommended minimum to select for provisioning

Getting Started

The following icon should appear in the top right of the OpenShift web console after you have installed the operator. Clicking this icon launches the web terminal.

Web Terminal

NOTE: Reload the page in your browser if you do not see the icon after installing the operator.

# bootstrap the enhanced web terminal
YOLO_URL=https://raw.githubusercontent.com/redhat-na-ssa/demo-ai-gitops-catalog/main/scripts/library/term.sh

. <(curl -s "${YOLO_URL}")

term_init

NOTE: open a new terminal to full activate the new configuration


ALTERNATIVE - Use a local environment / shell

  1. Verify you are logged into your cluster using oc.
  2. Clone this repository

NOTE: See the tools section below for more info

# verify oc login
oc whoami

# git clone this repo
git clone https://github.com/redhat-na-ssa/demo-ai-gitops-catalog
cd demo-ai-gitops-catalog

# load functions into a bash shell
. scripts/functions.sh

Apply Configurations / Demos

Setup basic cluster config

# load functions
. scripts/functions.sh

# setup a persistent enhanced web terminal on a default cluster
apply_firmly bootstrap/install-web-terminal

# setup a default cluster w/o argocd managing it
apply_firmly clusters/default

Setup a demo

# setup a dev spaces demo /w gpu
apply_firmly demos/devspaces-nvidia-gpu-autoscale

# setup a rhoai demo /w gpu
apply_firmly demos/rhoai-nvidia-gpu-autoscale

Alternative - running bootstrap.sh

Running scripts/bootstrap.sh will allow you to select common options. This is a work in progress.

This script handles configurations that are not fully declarative, require imperative steps, or require user interaction.

Cherry Picking Configurations

Various kustomized app configs and cluster configs can be applied individually.

Operator installs can be done quickly via oc - similar to the GitOps Catalog.

oc apply -k and apply_firmly can be used interchangeably in the examples below:

# setup htpasswd based login
oc apply -k components/cluster-configs/login/overlays/htpasswd

# disable self provisioner in cluster
oc apply -k components/cluster-configs/rbac/overlays/no-self-provisioner

# install minio w/ minio namespace
oc apply -k components/app-configs/minio/overlays/with-namespace

# install the nfs provisioner
oc apply -k components/app-configs/nfs-provisioner/overlays/default

Examples with operators that require CRDs

# setup serverless w/ instance
apply_firmly components/operators/serverless-operator/aggregate/default

# setup acs with a minimal configuration
apply_firmly components/operators/rhacs-operator/aggregate/minimal

Common functions

Common operational tasks are provided in the scripts library. You can run individual functions in a bash shell:

NOTE: These functions are available in an enhanced web terminal - see install above

# load functions
. scripts/functions.sh

get_functions

Workshops

This is currently under development

# load functions
. scripts/wip/workshop_functions.sh

# setup workshop with 25 users
workshop_setup 25

Known Issues

oc apply -k commands may fail on the first try.

This is inherent to how Kubernetes handles custom resources (CR) - A CR must be created after it has been defined via a custom resource definition (CRD).

The solution... re-run the command until it succeeds.

The function apply_firmly is interchangeable with oc apply -k and is similar to the following shell command:

until oc apply -k < path to kustomization.yaml >; do : ; done

Referencing this Catalog

This repo is currently subject to frequent, breaking changes!

Always reference with a commit hash or tag

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - https://github.com/redhat-na-ssa/demo-ai-gitops-catalog/components/app-configs/nvidia-gpu-verification/overlays/toleration-replicas-6?ref=v0.04

Development

Tools

The following cli tools are required:

  • bash, git
  • oc - Download mac, linux, windows
  • kubectl (optional) - Included in oc bundle
  • kustomize (optional) - Download mac, linux

NOTE: bash, git, and oc are available in the OpenShift Web Terminal

The following are used to encrypt secrets and are optional:

Contributing

Please run the following before submitting a PR / commit

scripts/lint.sh

Additional Info

Internal Docs

External Links