/caikit-tgis-serving

Primary LanguageShellApache License 2.0Apache-2.0

Caikit-TGIS-Serving

Caikit-TGIS-Serving is a combined image that allows data scientists to perform Large Learning Model (LLM) inference.

The Caikit-TGIS-Serving stack consists of these components:

  • Text Generation Inference Server (TGIS): The serving backend that loads the models and provides the inference engine.
  • Caikit: A wrapper layer that handles the lifecycle of the TGIS process, provides the inference endpoints, and has modules to handle different model types.
  • Caikit-nlp: The Caikit module that handles Natural Language Processing (NLP)-style models.
  • KServe: A Kubernetes Custom Resource Definition that orchestrates model serving for all types of models. It includes serving runtimes that implement the loading of given types of model servers. KServe handles the lifecycle of the deployment object, storage access, and networking setup.
  • Service Mesh (istio): The service mesh networking layer that manages traffic flows and enforces access policies.
  • Serverless (knative): A cloud-native development model that allows for serverless deployments of data models.

Architecture of the stack

KServe+Knative+Istio+Caikit_TGIS Diagram

Installation

The procedures for installing and deploying the Caikit-TGIS-Serving stack have been tested with Red Hat OpenShift Data Science self-managed on Red Hat OpenShift Service for AWS (ROSA) and OpenShift Dedicated clusters. They have not been tested with the OpenShift Data Science managed cloud service.

Prerequisites

  • To support inferencing, your cluster needs a node with 4 CPUs and 8 GB memory. You can adjust these settings in the spec.resources.requests section of the Serving Runtime custom resource.
  • You need cluster administrator permissions for many of the procedures (such as, installing operators, setting service-mesh configuration, and enabling http2).
  • You have installed the OpenShift CLI (oc).

Procedures

There are two ways to install the KServe/Caikit/TGIS stack:

Demos

After you install the KServe/Caikit/TGIS stack, you can try these demos: