/holmesgpt

The Open Source DevOps Assistant - solve problems twice as fast with an AI teammate

Primary LanguagePythonMIT LicenseMIT

HolmesGPT - The Open Source DevOps Assistant

Solve Problems Twice as Fast with an AI Teammate

Use Cases | Examples | Key Features | Installation

The only DevOps assistant that solves problems like a human does - by looking at problems and fetching missing data repeatedly until the problem can be solved. Powered by OpenAI or any tool-calling LLM of your choice, including open source models.

Use Cases:

  • Kubernetes Troubleshooting: Ask questions about your cloud, identify problems, and troubleshoot them
  • Incident Response: Investigate firing alerts by gathering data and determining the root cause
  • Ticket Management: Analyze and resolve tickets related to DevOps tasks
  • Automated Investigation and Triage: Prioritize critical alerts and resolve the highest impact issues first.
  • Runbook Automation in Plain English: No more defining runbooks as YAML or complicated workflows. Just describe tasks in plain English and the AI will follow the instructions

See it in Action

AI Alert Analysis

Examples

Investigate a Kubernetes Problem
holmes ask "what pods are unhealthy in my cluster and why?"
Ask Questions About Your Cloud
holmes ask "what services does my cluster expose externally?"
Investigate a Firing Prometheus alert
kubectl port-forward alertmanager-robusta-kube-prometheus-st-alertmanager-0 9093:9093 &
holmes investigate alertmanager --alertmanager-url http://localhost:9093

Note - if on Mac OS and using the Docker image, you will need to use http://docker.for.mac.localhost:9093 instead of http://localhost:9093

Investigate a Jira Ticket
holmes investigate jira --jira-url https://<PLACEDHOLDER>.atlassian.net --jira-username <PLACEHOLDER_EMAIL> --jira-api-key <PLACEHOLDER_API_KEY>

Like what you see? Checkout more examples or get started by installing HolmesGPT.

Key Features

  • Connects to Existing Observability Data: Find correlations you didn’t know about. No need to gather new data or add instrumentation.
  • Compliance Friendly: Can be run on-premise with your own LLM (or in the cloud with OpenAI or Azure)
  • Transparent Results: See a log of the AI’s actions and what data it gathered to understand how it reached conclusions
  • Extensible Data Sources: Connect the AI to custom data by providing your own tool definitions
  • Runbook Automation: Optionally provide runbooks in plain English and the AI will follow them automatically
  • Integrates with Existing Workflows: Connect Slack and Jira to get results inside your existing tools

Installation

First you will need an OpenAI API key, or the equivalent for another model. Then install with one of the below methods:

Brew (Mac/Linux)
  1. Add our tap:
brew tap robusta-dev/homebrew-holmesgpt
  1. Install holmesgpt:
brew install holmesgpt
  1. Check that installation was successful. This will take a few seconds on the first run - wait patiently.:
holmes --help
  1. Run holmesgpt:
holmes ask "what issues do I have in my cluster"
Prebuilt Docker Container

Run the below command, replacing <VERSION_PLACEHOLDER> with the latest HolmesGPT version - e.g. 0.1.

docker run -it --net=host -v $(pwd)/config.yaml:/app/config.yaml -v ~/.aws:/root/.aws -v ~/.config/gcloud:/root/.config/gcloud -v $HOME/.kube/config:/root/.kube/config us-central1-docker.pkg.dev/genuine-flight-317411/devel/holmes:<VERSION_PLACEHOLDER> ask "what pods are unhealthy and why?"
From Source (Python Poetry)

First install poetry (the python package manager)

Clone the project from github, and then run:

cd holmesgpt
poetry install --no-root
poetry run python3 holmes.py ask "what pods are unhealthy and why?"
From Source (Docker)

Clone the project from github, and then run:

cd holmesgpt
docker build -t holmes .
docker run -it --net=host -v $(pwd)/config.yaml:/app/config.yaml -v ~/.aws:/root/.aws -v ~/.config/gcloud:/root/.config/gcloud -v $HOME/.kube/config:/root/.kube/config holmest ask "what pods are unhealthy and why?"

Getting an API Key

HolmesGPT requires an API Key to function. Follow one of the instructions below.

OpenAI

To work with OpenAI’s GPT 3.5 or GPT-4 models you need a paid OpenAI API key.

Note: This is different from being a “ChatGPT Plus” subscriber.

Add the api_key to the config.yaml or pass them via the CLI.

Azure OpenAI

To work with Azure AI, you need the Azure OpenAI.

holmes ask "what pods are unhealthy and why?" --llm=azure --api-key=<PLACEHOLDER> --azure-endpoint='<PLACEHOLDER>'
Using a self-hosted LLM

You will need an LLM with support for function-calling (tool-calling). To use it, set the OPENAI_BASE_URL environment variable and run holmes with a relevant model name set using --model.

Important: Please verify that your model and inference server support function calling! HolmesGPT is currently unable to check if the LLM it was given supports function-calling or not. Some models that lack function-calling capabilities will hallucinate answers instead of reporting that they are unable to call functions. This behaviour depends on the model.

In particular, note that vLLM does not yet support function calling, whereas llama-cpp does support it.

Setting up Config file

Customising config

Custom Toolsets

You can define your own custom toolsets to extend the functionality of your setup. These toolsets can include querying company-specific data, fetching logs from observability tools, and more.

# Add paths to your custom toolsets here
# Example: ["path/to/your/custom_toolset.yaml"]
#custom_toolsets: ["examples/custom_toolset.yaml"]

Alertmanager Configuration

Configure the URL for your Alertmanager instance to enable alert management and notifications.

# URL for the Alertmanager
#alertmanager_url: "http://localhost:9093"

Jira Integration

Integrate with Jira to automate issue tracking and project management tasks. Provide your Jira credentials and specify the query to fetch issues.

# Jira credentials and query settings
#jira_username: "user@company.com"
#jira_api_key: "..."
#jira_url: "https://your-company.atlassian.net"
#jira_query: "project = 'Natan Test Project' and Status = 'To Do'"

Slack Integration

Configure Slack to send notifications to specific channels. Provide your Slack token and the desired channel for notifications.

# Slack token and channel configuration
#slack_token: "..."
#slack_channel: "#general"

Large Language Model (LLM) Configuration

Choose between OpenAI or Azure for integrating large language models. Provide the necessary API keys and endpoints for the selected service.

OpenAI

# Configuration for OpenAI LLM
#llm: "openai"
#api_key: "..."

Azure

# Configuration for Azure LLM
#llm: "azure"
#api_key: "..."
#azure_endpoint: "..."

Custom Runbooks

Define custom runbooks to give explicit instructions to the LLM on how to investigate certain alerts. This can help in achieving better results for known alerts.

# Add paths to your custom runbooks here
# Example: ["path/to/your/custom_runbook.yaml"]
#custom_runbooks: ["examples/custom_runbooks.yaml"]

More Examples

Identify which Helm value to modify

LLM uses the built-in Helm toolset to gather information.

holmes ask "what helm value should I change to increase memory request of the my-argo-cd-argocd-server-6864949974-lzp6m pod"
Optimize Docker container size

LLM uses the built-in Docker toolset to gather information.

holmes ask "Tell me what layers of my pavangudiwada/robusta-ai docker image consume the most storage and suggest some fixes to it"
Investigate a Prometheus alert and share results in Slack

By default investigation results are displayed in the CLI itself. You can optionally get these results in a Slack channel:

holmes investigate alertmanager --alertmanager-url http://localhost:9093 --destination slack --slack-token <PLACEHOLDER_SLACK_TOKEN> --slack-channel <PLACEHOLDER_SLACK_CHANNEL>

Alternatively you can update the config.yaml with your Slack details and run:

holmes investigate alertmanager --alertmanager-url http://localhost:9093 --destination slack
Investigate and update Jira tickets with findings

By default Jira investigation results are displayed in the CLI itself. But you can use --update-ticket to get the results as a comment in the Jira ticket.

holmes investigate jira --jira-url https://<PLACEDHOLDER>.atlassian.net --jira-username <PLACEHOLDER_EMAIL> --jira-api-key <PLACEHOLDER_API_KEY> --update-ticket

Alternatively you can update the config.yaml with your Jira account details and run:

holmes investigate jira --update-ticket

Advanced Usage

Add Custom Tools

The more data you give HolmesGPT, the better it will perform. Give it access to more data by adding custom tools.

New tools are loaded using -t from custom toolset files or by adding them to the config.yaml in custom_toolsets.

Add Custom Runbooks

HolmesGPT can investigate by following runbooks written in plain English. Add your own runbooks to provided the LLM specific instructions.

New runbooks are loaded using -r from custom runbook files or by adding them to the config.yaml in custom_runbooks.

Reading settings from a config file

You can customize HolmesGPT's behaviour with command line flags, or you can save common settings in config file for re-use.

You can view an example config file with all available settings here.

By default, without specifying --config the agent will try to read config.yaml from the current directory. If a setting is specified in both in config file and cli, cli takes precedence.

More Integrations

Slack

Adding a Slack integration allows the LLM to send Prometheus Alert investigation details to a Slack channel. To do this you need the following

  1. slack-token: The Slack API key. You can generate with pip install robusta-cli && robusta integrations slack
  2. slack-channel: The Slack channel where you want to receive the findings.

Add these values to the config.yaml or pass them via the CLI.

Jira

Adding a Jira integration allows the LLM to fetch Jira tickets and investigate automatically. Optionally it can update the Jira ticked with findings too. You need the following to use this

  1. url: The URL of your workspace. For example: https://workspace.atlassian.net (Note: schema (https) is required)
  2. username: The email you use to log into your Jira account. Eg: jira-user@company.com
  3. api_key: Follow these instructions to get your API key.
  4. project: Name of the project you want the Jira tickets to be created in. Go to Project Settings -> Details -> Name.
  5. status: Status of a ticket. Example: To Do, In Progress

Add these values to the config.yaml or pass them via the CLI.

License

Distributed under the MIT License. See LICENSE.txt for more information.

Support

If you have any questions, feel free to message us on robustacommunity.slack.com