Caikit-tgis-serving is a combined image that allows users to perform LLM inference.
It consists of several components:
- TGIS: Serving backend, loads the models, and provides the inference engine
- Caikit: Wrapper layer that handles the lifecycle of the TGIS process, provides the inference endpoints, and has modules to handle different model types
- Caikit-nlp: Caikit module that handles NLP style models
- KServe: Orchestrates model serving for all types of models, servingruntimes implement loading given types of model servers. KServe handles the lifecycle of the deployment object, storage access, networking setup, etc.
- Service Mesh (istio): Service mesh networking layer, manages traffic flows, enforces access policies, etc.
- Serverless (knative): Allows for serverless deployments of models
- Openshift Cluster
- This doc is written based on a ROSA cluster and has been tested with an OSD cluster as well
- Many of the tasks in this tutorial require cluster-admin permission level (e.g., install operators, set service-mesh configuration, enable http2, etc)
- 4 CPU and 16 GB memory in a node for inferencing (can be adjusted in servingRuntime deployment)
- CLI tools
- oc cli
The following required operators will be installed as part of the KServe/Caikit/TGIS stack installation instructions.
- Kiali
- Red Hat OpenShift distributed tracing platform
- Red Hat OpenShift Service Mesh
- ServiceMeshControlPlan
- Openshift Serverless
- OpenDataHub
There are three ways to install the KServe/Caikit/TGIS stack (includes the installation of above-mentioned required operators).