The prometheus-service is a Keptn service that is responsible for
-
configuring Prometheus for monitoring services managed by Keptn, and
-
receiving alerts from Prometheus Alertmanager and translating the alert payload to a cloud event that is sent to the Keptn API.
-
It's used for retrieving Service Level Indicators (SLIs) from a Prometheus API endpoint. Per default, it fetches metrics from the prometheus instance set up by Keptn (
prometheus-service.monitoring.svc.cluster.local:8080), but it can also be configured to use any reachable Prometheus endpoint using basic authentication by providing the credentials via a secret in thekeptnnamespace of the cluster.The supported default SLIs are:
- throughput
- error_rate
- response_time_p50
- response_time_p90
- response_time_p95
The provided SLIs are based on the RED metrics
Please always double-check the version of Keptn you are using compared to the version of this service, and follow the compatibility matrix below.
| Keptn Version | Prometheus Service Image |
|---|---|
| 0.5.x | solidnerd/prometheus-service:0.2.0 |
| 0.6.x | solidnerd/prometheus-service:0.3.0 |
| 0.6.1 | solidnerd/prometheus-service:0.3.2 |
| 0.6.2 | solidnerd/prometheus-service:0.3.4 |
| 0.7.0, 0.7.1 | solidnerd/prometheus-service:0.3.5 |
| 0.7.2 | solidnerd/prometheus-service:0.3.6 |
| 0.8.0-alpha | solidnerd/prometheus-service:0.4.0-alpha |
| 0.8.0 | solidnerd/prometheus-service:0.4.0 |
| 0.8.1, 0.8.2 | solidnerd/prometheus-service:0.5.0 |
| 0.8.1 - 0.8.3 | solidnerd/prometheus-service:0.6.0 |
| 0.8.4 - 0.8.7 | solidnerd/prometheus-service:0.6.1 |
| 0.9.0 | solidnerd/prometheus-service:0.6.2 |
Keptn doesn't install or manage Prometheus and its components. Users need to install Prometheus and Prometheus Alert manager as a prerequisite.
Some environment variables have to set up in the prometheus-service deployment
# Prometheus installed namespace
- name: PROMETHEUS_NS
value: 'default'
# Prometheus server configmap name
- name: PROMETHEUS_CM
value: 'prometheus-server'
# Prometheus server app labels
- name: PROMETHEUS_LABELS
value: 'component=server'
# Prometheus configmap data's config filename
- name: PROMETHEUS_CONFIG_FILENAME
value: 'prometheus.yml'
# AlertManager configmap data's config filename
- name: ALERT_MANAGER_CONFIG_FILENAME
value: 'alertmanager.yml'
# Alert Manager config map name
- name: ALERT_MANAGER_CM
value: 'prometheus-alertmanager'
# Alert Manager app labels
- name: ALERT_MANAGER_LABELS
value: 'component=alertmanager'
# Alert Manager installed namespace
- name: ALERT_MANAGER_NS
value: 'default'
# Alert Manager template configmap name
- name: ALERT_MANAGER_TEMPLATE_CM
value: 'alertmanager-templates'- Download the Keptn's Prometheus service manifest
wget https://raw.githubusercontent.com/keptn-contrib/prometheus-service/release-0.6.2/deploy/service.yaml- Replace the environment variable value according to the use case and apply the manifest
kubectl apply -f service.yaml- Install Role and Rolebinding to permit Keptn's prometheus-service for performing operations in the Prometheus installed namespace.
kubectl apply -f https://raw.githubusercontent.com/keptn-contrib/prometheus-service/release-0.6.2/deploy/role.yaml -n <PROMETHEUS_NS>- Execute the following command to install Prometheus and set up the rules for the Prometheus Alerting Manager:
keptn configure monitoring prometheus --project=sockshop --service=carts- To verify that the Prometheus scrape jobs are correctly set up, you can access Prometheus by enabling port-forwarding for the prometheus-server:
kubectl port-forward svc/prometheus-server 8080 -n <PROMETHEUS_NS>Prometheus is then available on localhost:8080/targets where you can see the targets for the service.
Per default, the service works with the following assumptions regarding the setup of the Prometheus instance:
-
Each service within a stage of a project has a Prometheus scrape job definition with the name:
<service>-<project>-<stage>For example, if
project=sockshop,stage=productionandservice=carts, the scrape job name would have to becarts-sockshop-production. -
Every service provides the following metrics for its corresponding scrape job:
-
http_response_time_milliseconds (Histogram)
-
http_requests_total (Counter)
This metric has to contain the
statuslabel, indicating the HTTP response code of the requests handled by the service. It is highly recommended that this metric also provides a label to query metric values for specific endpoints, e.g.handler.An example of an entry would look like this:
http_requests_total{method="GET",handler="VersionController.getInformation",status="200",} 4.0
-
-
Based on those metrics, the queries for the SLIs are built as follows:
- throughput:
sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary"}[<test_duration_in_seconds>s])) - error_rate:
sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary",status!~'2..'}[<test_duration_in_seconds>s]))/sum(rate(http_requests_total{job="<service>-<project>-<stage>-canary"}[<test_duration_in_seconds>s])) - response_time_p50:
histogram_quantile(0.50, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le)) - response_time_p90:
histogram_quantile(0.90, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le)) - response_time_p95:
histogram_quantile(0.95, sum(rate(http_response_time_milliseconds_bucket{job='<service>-<project>-<stage>-canary'}[<test_duration_in_seconds>s])) by (le))
- throughput:
To use a Prometheus instance other than the one that is being managed by Keptn for a certain project, a secret containing the URL and the access credentials has to be deployed into the keptn namespace. The secret must have the following format:
user: test
password: test
url: http://prometheus-service.monitoring.svc.cluster.local:8080If this information is stored in a file, e.g. prometheus-creds.yaml, it can be stored with the following command (don't forget to replace the <project> placeholder with the name of your project:
kubectl create secret -n keptn generic prometheus-credentials-<project> --from-file=prometheus-credentials=./mock_secret.yamlPlease note that there is a naming convention for the secret, because this can be configured per project. Therefore, the secret has to have the name prometheus-credentials-<project>
Users can override the predefined queries, as well as add custom queries by creating a SLI configuration.
-
A SLI configuration is a yaml file as shown below:
--- spec_version: '1.0' indicators: cpu_usage: avg(rate(container_cpu_usage_seconds_total{namespace="$PROJECT-$STAGE",pod_name=~"$SERVICE-primary-.*"}[5m])) response_time_p95: histogram_quantile(0.95, sum by(le) (rate(http_response_time_milliseconds_bucket{handler="ItemsController.addToCart",job="$SERVICE-$PROJECT-$STAGE-canary"}[$DURATION_SECONDS])))
-
To store this configuration, you need to add this file to a Keptn's configuration store. This is done by using the Keptn CLI with the keptn add-resource command (see SLI Provider for more information).
Within the user-defined queries, the following variables can be used to dynamically build the query, depending on the project/stage/service, and the time frame:
- $PROJECT: will be replaced with the name of the project
- $STAGE: will be replaced with the name of the stage
- $SERVICE: will be replaced with the name of the service
- $DURATION_SECONDS: will be replaced with the test run duration, e.g. 30s
For example, if an evaluation for the service carts in the stage production of the project sockshop is triggered, and the tests ran for 30s these will be the resulting queries:
rate(my_custom_metric{job='$SERVICE-$PROJECT-$STAGE',handler=~'$handler'}[$DURATION_SECONDS]) => rate(my_custom_metric{job='carts-sockshop-production',handler=~'$handler'}[30s])
You are welcome to contribute using Pull Requests against the master branch. Before contributing, please read our Contributing Guidelines.
Travis is configured with CI to automatically build docker images for pull requests and commits. The pipeline can be viewed at https://travis-ci.org/keptn-contrib/prometheus-service.
The Travis pipeline needs to be configured with the REGISTRY_USER and REGISTRY_PASSWORD variables.