/k8sgpt

Giving Kubernetes Superpowers to everyone

Primary LanguageGoApache License 2.0Apache-2.0

Text changing depending on mode. Light: 'So light!' Dark: 'So dark!'

GitHub code size in bytes GitHub Workflow Status GitHub release (latest by date) OpenSSF Best Practices Link to documentation FOSSA Status codecov

k8sgpt is a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English.

It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.

Out of the box integration with OpenAI, Azure, Cohere, Amazon Bedrock and local models.

K8sGPT - K8sGPT gives Kubernetes Superpowers to everyone | Product Hunt

CLI Installation

Linux/Mac via brew

brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt
RPM-based installation (RedHat/CentOS/Fedora)

32 bit:

curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.24/k8sgpt_386.rpm
sudo rpm -ivh k8sgpt_386.rpm

64 bit:

curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.24/k8sgpt_amd64.rpm
sudo rpm -ivh -i k8sgpt_amd64.rpm
DEB-based installation (Ubuntu/Debian)

32 bit:

curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.24/k8sgpt_386.deb
sudo dpkg -i k8sgpt_386.deb

64 bit:

curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.24/k8sgpt_amd64.deb
sudo dpkg -i k8sgpt_amd64.deb
APK-based installation (Alpine)

32 bit:

curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.24/k8sgpt_386.apk
apk add k8sgpt_386.apk

64 bit:

curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.24/k8sgpt_amd64.apk
apk add k8sgpt_amd64.apk
x
Failing Installation on WSL or Linux (missing gcc) When installing Homebrew on WSL or Linux, you may encounter the following error:
==> Installing k8sgpt from k8sgpt-ai/k8sgpt Error: The following formula cannot be installed from a bottle and must be
built from the source. k8sgpt Install Clang or run brew install gcc.

If you install gcc as suggested, the problem will persist. Therefore, you need to install the build-essential package.

   sudo apt-get update
   sudo apt-get install build-essential

Windows

  • Download the latest Windows binaries of k8sgpt from the Release tab based on your system architecture.
  • Extract the downloaded package to your desired location. Configure the system path variable with the binary location

Operator Installation

To install within a Kubernetes cluster please use our k8sgpt-operator with installation instructions available here

This mode of operation is ideal for continuous monitoring of your cluster and can integrate with your existing monitoring such as Prometheus and Alertmanager.

Quick Start

  • Currently the default AI provider is OpenAI, you will need to generate an API key from OpenAI
    • You can do this by running k8sgpt generate to open a browser link to generate it
  • Run k8sgpt auth add to set it in k8sgpt.
    • You can provide the password directly using the --password flag.
  • Run k8sgpt filters to manage the active filters used by the analyzer. By default, all filters are executed during analysis.
  • Run k8sgpt analyze to run a scan.
  • And use k8sgpt analyze --explain to get a more detailed explanation of the issues.
  • You also run k8sgpt analyze --with-doc (with or without the explain flag) to get the official documentation from kubernetes.

Analyzers

K8sGPT uses analyzers to triage and diagnose issues in your cluster. It has a set of analyzers that are built in, but you will be able to write your own analyzers.

Built in analyzers

Enabled by default

  • podAnalyzer
  • pvcAnalyzer
  • rsAnalyzer
  • serviceAnalyzer
  • eventAnalyzer
  • ingressAnalyzer
  • statefulSetAnalyzer
  • deploymentAnalyzer
  • cronJobAnalyzer
  • nodeAnalyzer
  • mutatingWebhookAnalyzer
  • validatingWebhookAnalyzer

Optional

  • hpaAnalyzer
  • pdbAnalyzer
  • networkPolicyAnalyzer
  • gatewayClass
  • gateway
  • httproute

Examples

Run a scan with the default analyzers

k8sgpt generate
k8sgpt auth add
k8sgpt analyze --explain
k8sgpt analyze --explain --with-doc

Filter on resource

k8sgpt analyze --explain --filter=Service

Filter by namespace

k8sgpt analyze --explain --filter=Pod --namespace=default

Output to JSON

k8sgpt analyze --explain --filter=Service --output=json

Anonymize during explain

k8sgpt analyze --explain --filter=Service --output=json --anonymize
Using filters

List filters

k8sgpt filters list

Add default filters

k8sgpt filters add [filter(s)]

Examples :

  • Simple filter : k8sgpt filters add Service
  • Multiple filters : k8sgpt filters add Ingress,Pod

Remove default filters

k8sgpt filters remove [filter(s)]

Examples :

  • Simple filter : k8sgpt filters remove Service
  • Multiple filters : k8sgpt filters remove Ingress,Pod
Additional commands

List configured backends

k8sgpt auth list

Update configured backends

k8sgpt auth update $MY_BACKEND1,$MY_BACKEND2..

Remove configured backends

k8sgpt auth remove -b $MY_BACKEND1,$MY_BACKEND2..

List integrations

k8sgpt integrations list

Activate integrations

k8sgpt integrations activate [integration(s)]

Use integration

k8sgpt analyze --filter=[integration(s)]

Deactivate integrations

k8sgpt integrations deactivate [integration(s)]

Serve mode

k8sgpt serve

Analysis with serve mode

grpcurl -plaintext -d '{"namespace": "k8sgpt", "explain": false}' localhost:8080 schema.v1.ServerService/Analyze

Key Features

LocalAI provider

To run local models, it is possible to use OpenAI compatible APIs, for instance LocalAI which uses llama.cpp and ggml to run inference on consumer-grade hardware. Models supported by LocalAI for instance are Vicuna, Alpaca, LLaMA, Cerebras, GPT4ALL, GPT4ALL-J and koala.

To run local inference, you need to download the models first, for instance you can find ggml compatible models in huggingface.com (for example vicuna, alpaca and koala).

Start the API server

To start the API server, follow the instruction in LocalAI.

Run k8sgpt

To run k8sgpt, run k8sgpt auth add with the localai backend:

k8sgpt auth add --backend localai --model <model_name> --baseurl http://localhost:8080/v1 --temperature 0.7

Now you can analyze with the localai backend:

k8sgpt analyze --explain --backend localai
AzureOpenAI provider

Prerequisites: an Azure OpenAI deployment is needed, please visit MS official documentation to create your own.

To authenticate with k8sgpt, you will need the Azure OpenAI endpoint of your tenant "https://your Azure OpenAI Endpoint", the api key to access your deployment, the deployment name of your model and the model name itself.

To run k8sgpt, run k8sgpt auth with the azureopenai backend:

k8sgpt auth add --backend azureopenai --baseurl https://<your Azure OpenAI endpoint> --engine <deployment_name> --model <model_name>

Lastly, enter your Azure API key, after the prompt.

Now you are ready to analyze with the azure openai backend:

k8sgpt analyze --explain --backend azureopenai
Cohere provider

Prerequisites: a Cohere API key is needed, please visit the Cohere dashboard to create one.

To run k8sgpt, run k8sgpt auth with the cohere backend:

k8sgpt auth add --backend cohere --model command-nightly

Lastly, enter your Cohere API key, after the prompt.

Now you are ready to analyze with the Cohere backend:

k8sgpt analyze --explain --backend cohere
Amazon Bedrock provider

Prerequisites Bedrock API access is needed.

As illustrated below, you will need to enable this in the AWS Console

In addition to this you will need to set the follow local environmental variables:

- AWS_ACCESS_KEY
- AWS_SECRET_ACCESS_KEY
- AWS_DEFAULT_REGION
k8sgpt auth add --backend amazonbedrock --model anthropic.claude-v2

Usage

k8sgpt analyze -e -b amazonbedrock

0 argocd/argocd-application-controller(argocd-application-controller)
- Error: StatefulSet uses the service argocd/argocd-application-controller which does not exist.

 You're right, I don't have enough context to determine if a StatefulSet is correctly configured to use a non-existent service. A StatefulSet manages Pods with persistent storage, and the Pods are created from the same spec. The service name referenced in the StatefulSet configuration would need to match an existing Kubernetes service for the Pods to connect to. Without more details on the specific StatefulSet and environment, I can't confirm whether the configuration is valid or not.
Amazon SageMaker Provider

Prerequisites

  1. AWS CLI Configuration: Make sure you have the AWS Command Line Interface (CLI) configured on your machine. If you haven't already configured the AWS CLI, you can follow the official AWS documentation for instructions on how to do it: AWS CLI Configuration Guide.

  2. SageMaker Instance: You need to have an Amazon SageMaker instance set up. If you don't have one already, you can follow the step-by-step instructions provided in this repository for creating a SageMaker instance: llm-sagemaker-jumpstart-cdk.

Backend Configuration

To add amazonsagemaker backend two parameters are required:

  • --endpointname Amazon SageMaker endpoint name.
  • --providerRegion AWS region where SageMaker instance is created. k8sgpt uses this region to connect to SageMaker (not the one defined with AWS CLI or environment variables )

To add amazonsagemaker as a backend run:

k8sgpt auth add --backend amazonsagemaker --providerRegion eu-west-1 --endpointname endpoint-xxxxxxxxxx

Optional params

Optionally, when adding the backend and later by changing the configuration file, you can set the following parameters:

-l, --maxtokens int Specify a maximum output length. Adjust (1-...) to control text length. Higher values produce longer output, lower values limit length (default 2048)

-t, --temperature float32 The sampling temperature, value ranges between 0 ( output be more deterministic) and 1 (more random) (default 0.7)

-c, --topp float32 Probability Cutoff: Set a threshold (0.0-1.0) to limit word choices. Higher values add randomness, lower values increase predictability. (default 0.5)

To make amazonsagemaker as a default backend run:

k8sgpt auth default -p amazonsagemaker

AmazonSageMaker Usage

./k8sgpt analyze -e -b amazonsagemaker
 100% |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (1/1, 14 it/min)
AI Provider: amazonsagemaker

0 default/nginx(nginx)
- Error: Back-off pulling image "nginxx"
 Error: Back-off pulling image "nginxx"

Solution:

1. Check if the image exists in the registry by running `docker image ls nginxx`.
2. If the image is not found, try pulling it by running `docker pull nginxx`.
3. If the image is still not available, check if there are any network issues by running `docker network inspect` and `docker network list`.
4. If the issue persists, try restarting the Docker daemon by running `sudo service docker restart`.
Setting a new default AI provider

There may be scenarios where you wish to have K8sGPT plugged into several default AI providers. In this case you may wish to use one as a new default, other than OpenAI which is the project default.

To view available providers

k8sgpt auth list
Default:
> openai
Active:
> openai
> azureopenai
Unused:
> localai
> noopai
> amazonbedrock
> cohere

To set a new default provider

k8sgpt auth default -p azureopenai
Default provider set to azureopenai

With this option, the data is anonymized before being sent to the AI Backend. During the analysis execution, k8sgpt retrieves sensitive data (Kubernetes object names, labels, etc.). This data is masked when sent to the AI backend and replaced by a key that can be used to de-anonymize the data when the solution is returned to the user.

Anonymization
  1. Error reported during analysis:
Error: HorizontalPodAutoscaler uses StatefulSet/fake-deployment as ScaleTargetRef which does not exist.
  1. Payload sent to the AI backend:
Error: HorizontalPodAutoscaler uses StatefulSet/tGLcCRcHa1Ce5Rs as ScaleTargetRef which does not exist.
  1. Payload returned by the AI:
The Kubernetes system is trying to scale a StatefulSet named tGLcCRcHa1Ce5Rs using the HorizontalPodAutoscaler, but it cannot find the StatefulSet. The solution is to verify that the StatefulSet name is spelled correctly and exists in the same namespace as the HorizontalPodAutoscaler.
  1. Payload returned to the user:
The Kubernetes system is trying to scale a StatefulSet named fake-deployment using the HorizontalPodAutoscaler, but it cannot find the StatefulSet. The solution is to verify that the StatefulSet name is spelled correctly and exists in the same namespace as the HorizontalPodAutoscaler.

Note: Anonymization does not currently apply to events.

Further Details

Anonymization does not currently apply to events.

In a few analysers like Pod, we feed to the AI backend the event messages which are not known beforehand thus we are not masking them for the time being.

  • The following is the list of analysers in which data is being masked:-

    • Statefulset
    • Service
    • PodDisruptionBudget
    • Node
    • NetworkPolicy
    • Ingress
    • HPA
    • Deployment
    • Cronjob
  • The following is the list of analysers in which data is not being masked:-

    • RepicaSet
    • PersistentVolumeClaim
    • Pod
    • *Events

*Note:

  • k8gpt will not mask the above analysers because they do not send any identifying information except Events analyser.

  • Masking for Events analyzer is scheduled in the near future as seen in this issue. Further research has to be made to understand the patterns and be able to mask the sensitive parts of an event like pod name, namespace etc.

  • The following is the list of fields which are not being masked:-

    • Describe
    • ObjectStatus
    • Replicas
    • ContainerStatus
    • *Event Message
    • ReplicaStatus
    • Count (Pod)

*Note:

  • It is quite possible the payload of the event message might have something like "super-secret-project-pod-X crashed" which we don't currently redact (scheduled in the near future as seen in this issue).

Proceed with care

  • The K8gpt team recommends using an entirely different backend (a local model) in critical production environments. By using a local model, you can rest assured that everything stays within your DMZ, and nothing is leaked.
  • If there is any uncertainty about the possibility of sending data to a public LLM (open AI, Azure AI) and it poses a risk to business-critical operations, then, in such cases, the use of public LLM should be avoided based on personal assessment and the jurisdiction of risks involved.
Configuration management

k8sgpt stores config data in the $XDG_CONFIG_HOME/k8sgpt/k8sgpt.yaml file. The data is stored in plain text, including your OpenAI key.

Config file locations:

OS Path
MacOS ~/Library/Application Support/k8sgpt/k8sgpt.yaml
Linux ~/.config/k8sgpt/k8sgpt.yaml
Windows %LOCALAPPDATA%/k8sgpt/k8sgpt.yaml
There may be scenarios where caching remotely is preferred. In these scenarios K8sGPT supports AWS S3 or Azure Blob storage Integration. Remote caching Note: You can only configure and use only one remote cache at a time

Adding a remote cache

  • AWS S3
    • As a prerequisite AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are required as environmental variables.
    • Configuration, k8sgpt cache add s3 --region <aws region> --bucket <name>
      • K8sGPT will create the bucket if it does not exist
  • Azure Storage
    • We support a number of techniques to authenticate against Azure
    • Configuration, k8sgpt cache add azure --storageacc <storage account name> --container <container name>
      • K8sGPT assumes that the storage account already exist and it will create the container if it does not exist
      • It is the user responsibility have to grant specific permissions to their identity in order to be able to upload blob files and create SA containers (e.g Storage Blob Data Contributor)
  • Google Cloud Storage
    • As a prerequisite GOOGLE_APPLICATION_CREDENTIALS are required as environmental variables.
    • Configuration, k8sgpt cache add gcs --region <gcp region> --bucket <name> --projectid <project id>
      • K8sGPT will create the bucket if it does not exist

Listing cache items

k8sgpt cache list

Purging an object from the cache Note: purging an object using this command will delete upstream files, so it requires appropriate permissions.

k8sgpt cache purge $OBJECT_NAME

Removing the remote cache Note: this will not delete the upstream S3 bucket or Azure storage container

k8sgpt cache remove

Documentation

Find our official documentation available here

Contributing

Please read our contributing guide.

Community

Find us on Slack

License

FOSSA Status