The only DevOps assistant that solves problems like a human does - by looking at problems and fetching missing data repeatedly until the problem can be solved. Powered by OpenAI or any tool-calling LLM of your choice, including open source models.
- Kubernetes Troubleshooting: Ask questions about your cloud, identify problems, and troubleshoot them
- Incident Response: Investigate firing alerts by gathering data and determining the root cause
- Ticket Management: Analyze and resolve tickets related to DevOps tasks
- Automated Investigation and Triage: Prioritize critical alerts and resolve the highest impact issues first.
- Runbook Automation in Plain English: No more defining runbooks as YAML or complicated workflows. Just describe tasks in plain English and the AI will follow the instructions
Investigate a Kubernetes Problem
holmes ask "what pods are unhealthy in my cluster and why?"
Ask Questions About Your Cloud
holmes ask "what services does my cluster expose externally?"
Investigate a Firing Prometheus alert
kubectl port-forward alertmanager-robusta-kube-prometheus-st-alertmanager-0 9093:9093 &
holmes investigate alertmanager --alertmanager-url http://localhost:9093
Note - if on Mac OS and using the Docker image, you will need to use http://docker.for.mac.localhost:9093
instead of http://localhost:9093
Investigate a Jira Ticket
holmes investigate jira --jira-url https://<PLACEDHOLDER>.atlassian.net --jira-username <PLACEHOLDER_EMAIL> --jira-api-key <PLACEHOLDER_API_KEY>
Like what you see? Checkout more examples or get started by installing HolmesGPT.
- Connects to Existing Observability Data: Find correlations you didn’t know about. No need to gather new data or add instrumentation.
- Compliance Friendly: Can be run on-premise with your own LLM (or in the cloud with OpenAI or Azure)
- Transparent Results: See a log of the AI’s actions and what data it gathered to understand how it reached conclusions
- Extensible Data Sources: Connect the AI to custom data by providing your own tool definitions
- Runbook Automation: Optionally provide runbooks in plain English and the AI will follow them automatically
- Integrates with Existing Workflows: Connect Slack and Jira to get results inside your existing tools
First you will need an OpenAI API key, or the equivalent for another model. Then install with one of the below methods:
Brew (Mac/Linux)
- Add our tap:
brew tap robusta-dev/homebrew-holmesgpt
- Install holmesgpt:
brew install holmesgpt
- Check that installation was successful. This will take a few seconds on the first run - wait patiently.:
holmes --help
- Run holmesgpt:
holmes ask "what issues do I have in my cluster"
Prebuilt Docker Container
Run the below command, replacing <VERSION_PLACEHOLDER>
with the latest HolmesGPT version - e.g. 0.1
.
docker run -it --net=host -v $(pwd)/config.yaml:/app/config.yaml -v ~/.aws:/root/.aws -v ~/.config/gcloud:/root/.config/gcloud -v $HOME/.kube/config:/root/.kube/config us-central1-docker.pkg.dev/genuine-flight-317411/devel/holmes:<VERSION_PLACEHOLDER> ask "what pods are unhealthy and why?"
From Source (Python Poetry)
First install poetry (the python package manager)
Clone the project from github, and then run:
cd holmesgpt
poetry install --no-root
poetry run python3 holmes.py ask "what pods are unhealthy and why?"
From Source (Docker)
Clone the project from github, and then run:
cd holmesgpt
docker build -t holmes .
docker run -it --net=host -v $(pwd)/config.yaml:/app/config.yaml -v ~/.aws:/root/.aws -v ~/.config/gcloud:/root/.config/gcloud -v $HOME/.kube/config:/root/.kube/config holmest ask "what pods are unhealthy and why?"
HolmesGPT requires an API Key to function. Follow one of the instructions below.
OpenAI
To work with OpenAI’s GPT 3.5 or GPT-4 models you need a paid OpenAI API key.
Note: This is different from being a “ChatGPT Plus” subscriber.
Add the api_key
to the config.yaml or pass them via the CLI.
Azure OpenAI
To work with Azure AI, you need the Azure OpenAI.
holmes ask "what pods are unhealthy and why?" --llm=azure --api-key=<PLACEHOLDER> --azure-endpoint='<PLACEHOLDER>'
Using a self-hosted LLM
You will need an LLM with support for function-calling (tool-calling). To use it, set the OPENAI_BASE_URL environment variable and run holmes
with a relevant model name set using --model
.
Important: Please verify that your model and inference server support function calling! HolmesGPT is currently unable to check if the LLM it was given supports function-calling or not. Some models that lack function-calling capabilities will hallucinate answers instead of reporting that they are unable to call functions. This behaviour depends on the model.
In particular, note that vLLM does not yet support function calling, whereas llama-cpp does support it.
Customising config
You can define your own custom toolsets to extend the functionality of your setup. These toolsets can include querying company-specific data, fetching logs from observability tools, and more.
# Add paths to your custom toolsets here
# Example: ["path/to/your/custom_toolset.yaml"]
#custom_toolsets: ["examples/custom_toolset.yaml"]
Configure the URL for your Alertmanager instance to enable alert management and notifications.
# URL for the Alertmanager
#alertmanager_url: "http://localhost:9093"
Integrate with Jira to automate issue tracking and project management tasks. Provide your Jira credentials and specify the query to fetch issues.
# Jira credentials and query settings
#jira_username: "user@company.com"
#jira_api_key: "..."
#jira_url: "https://your-company.atlassian.net"
#jira_query: "project = 'Natan Test Project' and Status = 'To Do'"
Configure Slack to send notifications to specific channels. Provide your Slack token and the desired channel for notifications.
# Slack token and channel configuration
#slack_token: "..."
#slack_channel: "#general"
Choose between OpenAI or Azure for integrating large language models. Provide the necessary API keys and endpoints for the selected service.
# Configuration for OpenAI LLM
#llm: "openai"
#api_key: "..."
# Configuration for Azure LLM
#llm: "azure"
#api_key: "..."
#azure_endpoint: "..."
Define custom runbooks to give explicit instructions to the LLM on how to investigate certain alerts. This can help in achieving better results for known alerts.
# Add paths to your custom runbooks here
# Example: ["path/to/your/custom_runbook.yaml"]
#custom_runbooks: ["examples/custom_runbooks.yaml"]
Identify which Helm value to modify
LLM uses the built-in Helm toolset to gather information.
holmes ask "what helm value should I change to increase memory request of the my-argo-cd-argocd-server-6864949974-lzp6m pod"
Optimize Docker container size
LLM uses the built-in Docker toolset to gather information.
holmes ask "Tell me what layers of my pavangudiwada/robusta-ai docker image consume the most storage and suggest some fixes to it"
Investigate a Prometheus alert and share results in Slack
By default investigation results are displayed in the CLI itself. You can optionally get these results in a Slack channel:
holmes investigate alertmanager --alertmanager-url http://localhost:9093 --destination slack --slack-token <PLACEHOLDER_SLACK_TOKEN> --slack-channel <PLACEHOLDER_SLACK_CHANNEL>
Alternatively you can update the config.yaml
with your Slack details and run:
holmes investigate alertmanager --alertmanager-url http://localhost:9093 --destination slack
Investigate and update Jira tickets with findings
By default Jira investigation results are displayed in the CLI itself. But you can use --update-ticket
to get the results as a comment in the Jira ticket.
holmes investigate jira --jira-url https://<PLACEDHOLDER>.atlassian.net --jira-username <PLACEHOLDER_EMAIL> --jira-api-key <PLACEHOLDER_API_KEY> --update-ticket
Alternatively you can update the config.yaml
with your Jira account details and run:
holmes investigate jira --update-ticket
Add Custom Tools
The more data you give HolmesGPT, the better it will perform. Give it access to more data by adding custom tools.
New tools are loaded using -t
from custom toolset files or by adding them to the config.yaml
in custom_toolsets
.
Add Custom Runbooks
HolmesGPT can investigate by following runbooks written in plain English. Add your own runbooks to provided the LLM specific instructions.
New runbooks are loaded using -r
from custom runbook files or by adding them to the config.yaml
in custom_runbooks
.
Reading settings from a config file
You can customize HolmesGPT's behaviour with command line flags, or you can save common settings in config file for re-use.
You can view an example config file with all available settings here.
By default, without specifying --config
the agent will try to read config.yaml
from the current directory.
If a setting is specified in both in config file and cli, cli takes precedence.
Slack
Adding a Slack integration allows the LLM to send Prometheus Alert investigation details to a Slack channel. To do this you need the following
- slack-token: The Slack API key. You can generate with
pip install robusta-cli && robusta integrations slack
- slack-channel: The Slack channel where you want to receive the findings.
Add these values to the config.yaml
or pass them via the CLI.
Jira
Adding a Jira integration allows the LLM to fetch Jira tickets and investigate automatically. Optionally it can update the Jira ticked with findings too. You need the following to use this
- url: The URL of your workspace. For example: https://workspace.atlassian.net (Note: schema (https) is required)
- username: The email you use to log into your Jira account. Eg:
jira-user@company.com
- api_key: Follow these instructions to get your API key.
- project: Name of the project you want the Jira tickets to be created in. Go to Project Settings -> Details -> Name.
- status: Status of a ticket. Example:
To Do
,In Progress
Add these values to the config.yaml
or pass them via the CLI.
Distributed under the MIT License. See LICENSE.txt for more information.
If you have any questions, feel free to message us on robustacommunity.slack.com