Adding Observability to Nomad Applications
Recording of the demo is available on the hashitalk 2021 website. (slides)
This repository demonstrates how you can leverage the Grafana Open Source Observability Stack with Nomad workload.
In this demonstration we will deploy an application (TNS) on Nomad along with the Grafana Stack. The TNS application is written in Go and instrumented with:
- Prometheus Metrics using client_golang.
- Logs using gokit (output format is logfmt).
- Traces using jaeguer go client.
You can use the instrumentation of your choice such as: OpenTelemetry, Zipkin, json logs...
We'll also deploy backends to store collected signals:
- Prometheus will scrape Metrics using the scrape endpoint.
- Loki will receive Logs collected by Promtail.
- Tempo will directly receives Traces and Spans.
Finally, we'll deploy Grafana and provision it with all our backend datasources and a dashboard to start with.
Getting Started
For simplicity you'll need to install and configure vagrant.
To get started simply run:
vagrant up
Then you should be able to access:
- TNS app => http://127.0.0.1:8001/
- Nomad => http://127.0.0.1:4646/
- Consul => http://127.0.0.1:8500/ui
- Grafana => http://127.0.0.1:3000/
- Prometheus => http://127.0.0.1:9090/
- Promtail => http://127.0.0.1:3200/
You can go to the Nomad UI Jobs page to see all running jobs.
Nomad Client Configuration
Promtail need to access host logs folder. (alloc/{task_id}/logs) By default the docker driver in nomad doesn't allow mounting volumes. In this example we have enabled it using the plugin stanza:
plugin "docker" {
config {
volumes {
enabled = true
}
}
}
However you can also simply run Promtail binary on the host manually too or use nomad host_volume
feature.
Promtail also needs to save tail positions in a file, you should make sure this file is always the same between restart. Again in this example we're using a host path mounted in the container to persist this file,
Troubleshooting
Grafana shows nothing or TNS keeps crashing because of it can't connect to Tempo
- You may have troubles with your
dns
configuration in the jobs, if your jobs can't talks to each other tries to change the ip to127.0.0.1
or the internal ip address of your server if using aVPC
or just removes thedns
stanza. It's recommanded to use Consul Connect to connect every services to each others.
I can't see the logs in Grafana/Loki
- You may have a different
data_dir
config in yournomad
configuration. Here it's using/opt/nomad/data
while we generally sets/opt/nomad
. If it's your case, change thevolume
stanza of yourtempo
job.