SLIs and SLOs with Prometheus and Grafana for your APIs managed by Tyk

About

This is a demo project running on Docker, that shows how to configure Tyk Gateway, Tyk Pump, Prometheus and Grafana OSS to set-up a dashboard with SLIs and SLOs for your APIs managed by Tyk.

You can use it to explore the Prometheus metrics exposed by Tyk Pump and use them in a Grafana dashboard.

Deploy and run the demo

Clone this repository:

git clone https://github.com/TykTechnologies/demo-slo-prometheus-grafana.git

Start the services

cd ./demo-slo-prometheus-grafana/
docker compose up -d

Verify that all services are running

Tyk Gateway
- Health check runs on http://localhost:8080/hello
- httpbin API runs on http://localhost:8080/httpbin/
- httpstatus API runs on http://localhost:8080/status/
Tyk Pump
- Health check runs on http://localhost:8083/health
- Prometheus metrics endpoint runs on http://localhost:8084/metrics
Prometheus runs on http://localhost:9090/
Grafana OSS runs on http://localhost:3000/
- The default log-in at start is admin/admin, once logged in you will be prompted for a new password

Generate traffic

K6 is used to generate traffic to the API endpoints. The load script load.js will run for 15 minutes.

 docker compose run  k6 run /scripts/load.js

You will see K6 output in your terminal:

Check out the dashboard in Grafana

Go to Grafana in your browser (initial user/pwd: admin/admin) and open the dashboard called SLOs for APIs managed by Tyk.

You should see the data coming in:

You can also filter the data per API:

Tear down

Stop the services

docker compose stop

Remove the services

docker compose down

How this works

Configuration

Tyk API Gateway is configured to expose two API endpoint:
- httpbin (see .json config)
- httpstatus (see .json config)
K6 will use the load script load.js to generate demo traffic to the API endpoints
Tyk Pump is configured to expose a metric endpoint for Prometheus (see config) with two custom metrics called tyk_http_requests_total and tyk_http_latency. Tyk Pump version >= 1.6. is needed for custom metrics.
Prometheus
- prometheus.yml is configured to automatically scrape Tyk Pump's metric endpoint
- slos.rules.yml is used to calculate additional metrics needed for the remaining error budget
Grafana
- prometheus_ds.yml is configured to connect Grafana automatically to Prometheus
- SLOs-for-APIs-managed-by-Tyk.json is the dashboard definition

SLIs and SLOs

Definition and example inspired from https://sre.google/workbook/slo-document/, https://landing.google.com/sre/workbook/chapters/alerting-on-slos/ and https://github.com/google/prometheus-slo-burn-example/blob/master/prometheus/slos.rules.yml.

You will see different indicators displayed on the Grafana dashboard.

To calculate the SLO and the displayed error budget remaining, we use the following SLI/SLO:

SLI: the proportion of successful HTTP requests, as measured from Tyk API Gateway
- Any HTTP status other than 500–599 is considered successful.
- count of http_requests which do not have a 5XX status code divided by count of all http_requests
SLO: 95% successful requests

In slos.rules.yml we calculate the rate of error per requests for the last 10 minute in job:slo_errors_per_request:ratio_rate10m. With job:error_budget:remaining we calculate the error budget remaining in percent. This is what we display in the Grafana dashboard. We use a threshold of 95% in the dashboard (every value below 95% is red).

Contribute

You are welcome to contribute by

asking questions / suggesting improvment / reporting issues in this GitHub project or in the Tyk Community forum
making pull request, see the contributing guide

Support, questions & feedback

This is a demo project, using Tyk Gateway and Tyk Pump currently using release candidate (RC) versions of Tyk Gateway and Tyk Pump.

For question about our products, please use Tyk Community forum.
Clients can also use support@tyk.io.
Potential clients and evaluators, please use info@tyk.io.