Caikit-TGIS-Serving

Caikit-tgis-serving is a combined image that allows users to perform LLM inference.

It consists of several components:

TGIS: Serving backend, loads the models, and provides the inference engine
Caikit: Wrapper layer that handles the lifecycle of the TGIS process, provides the inference endpoints, and has modules to handle different model types
Caikit-nlp: Caikit module that handles NLP style models
KServe: Orchestrates model serving for all types of models, servingruntimes implement loading given types of model servers. KServe handles the lifecycle of the deployment object, storage access, networking setup, etc.
Service Mesh (istio): Service mesh networking layer, manages traffic flows, enforces access policies, etc.
Serverless (knative): Allows for serverless deployments of models

Installation

Openshift Cluster
- This doc is written based on a ROSA cluster and has been tested with an OSD cluster as well
- Many of the tasks in this tutorial require cluster-admin permission level (e.g., install operators, set service-mesh configuration, enable http2, etc)
- 4 CPU and 16 GB memory in a node for inferencing (can be adjusted in servingRuntime deployment)
CLI tools
- oc cli

The following required operators will be installed as part of the KServe/Caikit/TGIS stack installation instructions.

There are three ways to install the KServe/Caikit/TGIS stack (includes the installation of above-mentioned required operators).