This is a demo project running on Docker, that shows how to configure Tyk Gateway, Tyk Pump, Prometheus and Grafana OSS to set-up a dashboard with SLIs and SLOs for your APIs managed by Tyk.
You can use it to explore the Prometheus metrics exposed by Tyk Pump and use them in a Grafana dashboard.
- Clone this repository:
git clone https://github.com/TykTechnologies/demo-slo-prometheus-grafana.git
- Start the services
cd ./demo-slo-prometheus-grafana/
docker compose up -d
- Verify that all services are running
- Tyk Gateway
- Health check runs on http://localhost:8080/hello
- httpbin API runs on http://localhost:8080/httpbin/
- httpstatus API runs on http://localhost:8080/status/
- Tyk Pump
- Health check runs on http://localhost:8083/health
- Prometheus metrics endpoint runs on http://localhost:8084/metrics
- Prometheus runs on http://localhost:9090/
- Grafana OSS runs on http://localhost:3000/
- The default log-in at start is admin/admin, once logged in you will be prompted for a new password
- Generate traffic
K6 is used to generate traffic to the API endpoints. The load script load.js will run for 15 minutes.
docker compose run k6 run /scripts/load.js
You will see K6 output in your terminal:
- Check out the dashboard in Grafana
Go to Grafana in your browser (initial user/pwd: admin/admin) and open the dashboard called SLOs for APIs managed by Tyk.
You should see the data coming in:
You can also filter the data per API:
Stop the services
docker compose stop
Remove the services
docker compose down
- Tyk API Gateway is configured to expose two API endpoint:
- httpbin (see .json config)
- httpstatus (see .json config)
- K6 will use the load script load.js to generate demo traffic to the API endpoints
- Tyk Pump is configured to expose a metric endpoint for Prometheus (see config) with two custom metrics called
tyk_http_requests_total
andtyk_http_latency
. Tyk Pump version >= 1.6. is needed for custom metrics. - Prometheus
- prometheus.yml is configured to automatically scrape Tyk Pump's metric endpoint
- slos.rules.yml is used to calculate additional metrics needed for the remaining error budget
- Grafana
- prometheus_ds.yml is configured to connect Grafana automatically to Prometheus
- SLOs-for-APIs-managed-by-Tyk.json is the dashboard definition
Definition and example inspired from https://sre.google/workbook/slo-document/, https://landing.google.com/sre/workbook/chapters/alerting-on-slos/ and https://github.com/google/prometheus-slo-burn-example/blob/master/prometheus/slos.rules.yml.
You will see different indicators displayed on the Grafana dashboard.
To calculate the SLO and the displayed error budget remaining, we use the following SLI/SLO:
- SLI: the proportion of successful HTTP requests, as measured from Tyk API Gateway
- Any HTTP status other than 500–599 is considered successful.
- count of http_requests which do not have a 5XX status code divided by count of all http_requests
- SLO: 95% successful requests
In slos.rules.yml we calculate the rate of error per requests for the last 10 minute in job:slo_errors_per_request:ratio_rate10m
. With job:error_budget:remaining
we calculate the error budget remaining in percent. This is what we display in the Grafana dashboard. We use a threshold of 95% in the dashboard (every value below 95% is red).
You are welcome to contribute by
- asking questions / suggesting improvment / reporting issues in this GitHub project or in the Tyk Community forum
- making pull request, see the contributing guide
This is a demo project, using Tyk Gateway and Tyk Pump currently using release candidate (RC) versions of Tyk Gateway and Tyk Pump.
For question about our products, please use Tyk Community forum.
Clients can also use support@tyk.io.
Potential clients and evaluators, please use info@tyk.io.