/observability-nomad

This is a set of examples on how to add Observability to Nomad Applications

Primary LanguageHCLApache License 2.0Apache-2.0

Adding Observability to Nomad Applications

Recording of the demo is available on the hashitalk 2021 website. (slides)

This repository demonstrates how you can leverage the Grafana Open Source Observability Stack with Nomad workload.

In this demonstration we will deploy an application (TNS) on Nomad along with the Grafana Stack. The TNS application is written in Go and instrumented with:

You can use the instrumentation of your choice such as: OpenTelemetry, Zipkin, json logs...

We'll also deploy backends to store collected signals:

  • Prometheus will scrape Metrics using the scrape endpoint.
  • Loki will receive Logs collected by Promtail.
  • Tempo will directly receives Traces and Spans.

Finally, we'll deploy Grafana and provision it with all our backend datasources and a dashboard to start with.

Getting Started

For simplicity you'll need to install and configure vagrant.

To get started simply run:

vagrant up

Then you should be able to access:

You can go to the Nomad UI Jobs page to see all running jobs.

alt text

Nomad Client Configuration

Promtail need to access host logs folder. (alloc/{task_id}/logs) By default the docker driver in nomad doesn't allow mounting volumes. In this example we have enabled it using the plugin stanza:

  plugin "docker" {
    config {
      volumes {
        enabled      = true
      }
    }
  }

However you can also simply run Promtail binary on the host manually too or use nomad host_volume feature.

Promtail also needs to save tail positions in a file, you should make sure this file is always the same between restart. Again in this example we're using a host path mounted in the container to persist this file,

Troubleshooting

Grafana shows nothing or TNS keeps crashing because of it can't connect to Tempo

  • You may have troubles with your dns configuration in the jobs, if your jobs can't talks to each other tries to change the ip to 127.0.0.1 or the internal ip address of your server if using a VPC or just removes the dns stanza. It's recommanded to use Consul Connect to connect every services to each others.

I can't see the logs in Grafana/Loki

  • You may have a different data_dir config in your nomad configuration. Here it's using /opt/nomad/data while we generally sets /opt/nomad. If it's your case, change the volume stanza of your tempo job.