Go code instrumentation using Prometheus

This repo contains a demo of code instrumentation using prometheus official Go client.

Introduction

The application is a single endpoint API that fetches the weather for a given city using the openweathermap API. The code responsible for fetching the weather information is on the weather package.

The application also uses a redis server to store the number of requests done for each city, this logic is on the redis package and uses a channel to serialize the calls to the redis backend. This is done just to illustrate different instrumentation patterns.

Timeline

Use the different branches to navigate between the code as we go from a completely non instrumented code to a code with some instrumentation coverage.

Branch 01

This is the base code that implements the logic described in the introduction section.

Branch 02

In this step we introduced the prometheus client to our vendored dependencies in included it's HTTP handler on our server:

r.Handle("/metrics", promhttp.Handler())

With this single line of code we add several metrics about the Go runtime to our /metrics endpoint.

Branch 03

We begin this branch by adding instrumentation to our own handler.

Registering the new metric:

var (
	handlerDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
		Name: "myapp_handlers_duration_seconds",
		Help: "Handlers request duration in seconds",
	}, []string{"path"})
)

func init() {
	prometheus.MustRegister(handlerDuration)
}

We then create a wrapper handler that calls the underlaying handler and measures the duration it took:

func instrumentHandler(pattern string, next http.HandlerFunc) http.HandlerFunc {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		now := time.Now()
		next(w, r)
		handlerDuration.WithLabelValues(pattern).Observe(time.Since(now).Seconds())
	})
}

We use pattern as a label to be able to distingish the duration of different handlers.

The router is updated to use this handler on top of the old one:

r.HandleFunc("/city/{name}/temp", instrumentHandler("/city/{name}/temp", cityTemp)

We can also instrument our server and keep track of the opened connections. First, registering the new metric:

var (
	openConnections = prometheus.NewGauge(prometheus.GaugeOpts{
		Name: "myapp_open_connections",
		Help: "The current number of open connections.",
	})
)

func init() {
	prometheus.MustRegister(openConnections)
}

And then updating our http.Server:

s := &http.Server{
	Addr:         ":8080",
	Handler:      router(),
	ReadTimeout:  10 * time.Second,
	WriteTimeout: 30 * time.Second,
	ConnState: func(c net.Conn, s http.ConnState) {
		switch s {
		case http.StateNew:
			openConnections.Inc()
		case http.StateHijacked | http.StateClosed:
			openConnections.Dec()
		}
	},
}

The ConnState is called everytime a connection changes state.

Branch 04

In this step we added instrumentation to our weather package. The instrumentation on this package follows the RED pattern:

Request Rate
Error rate
Duration

This is done by adding the following metrics and initializing them:

var (
	requestsDuration = prometheus.NewHistogram(prometheus.HistogramOpts{
		Name: "myapp_weather_request_duration_seconds",
		Help: "The duration of the requests to the weather service.",
	})

	requestsCurrent = prometheus.NewGauge(prometheus.GaugeOpts{
		Name: "myapp_weather_requests_current",
		Help: "The current number of requests to the weather service.",
	})

	requestsStatus = prometheus.NewCounterVec(prometheus.CounterOpts{
		Name: "myapp_weather_requests_total",
		Help: "The total number of requests to the weather service by status.",
	}, []string{"status"})

	clientErrors = prometheus.NewCounter(prometheus.CounterOpts{
		Name: "myapp_weather_errors",
		Help: "The total number of weather client errors",
	})
)

func init() {
	prometheus.MustRegister(requestsDuration)
	prometheus.MustRegister(requestsCurrent)
	prometheus.MustRegister(requestsStatus)
	prometheus.MustRegister(clientErrors)
}

And updating the (w *weatherClient) do func to properly update those metrics, by adding the following snippet to the begin of the func:

now := time.Now()
requestsCurrent.Inc()
defer func() {
    requestsDuration.Observe(time.Since(now).Seconds())
    requestsCurrent.Dec()
    if resp != nil {
        requestsStatus.WithLabelValues(strconv.Itoa(resp.StatusCode)).Inc()
    }
    if err != nil {
        clientErrors.Inc()
    }
}()

Branch 05

In this step we add instrumentation to the redis package by implementing the prometheus.Collector interface to expose some of the metrics already available on the redis client.

The registration of the collector is done after the client is created, on the init function.

prometheus.MustRegister(&redisCollector{client: client})

The collector is a simple struct that implements two methods:

var (
	requestsDesc  = prometheus.NewDesc("myapp_redis_connections_requests_total", "The total number of connections requests to redis pool.", []string{}, nil)
	hitsDesc      = prometheus.NewDesc("myapp_redis_connections_hits_total", "The total number of times a free connection was found in redis pool.", []string{}, nil)
	waitsDesc     = prometheus.NewDesc("myapp_redis_connections_waits_total", "The total number of times the redis pool had to wait for a connection.", []string{}, nil)
	timeoutsDesc  = prometheus.NewDesc("myapp_redis_connections_timeouts_total", "The total number of wait timeouts in redis pool.", []string{}, nil)
	connsDesc     = prometheus.NewDesc("myapp_redis_connections_current", "The current number of connections in redis pool.", []string{}, nil)
	freeConnsDesc = prometheus.NewDesc("myapp_redis_connections_free_current", "The current number of free connections in redis pool.", []string{}, nil)
)

type redisCollector struct {
	client *redis.Client
}

func (c *redisCollector) Describe(ch chan<- *prometheus.Desc) {
	ch <- requestsDesc
	ch <- hitsDesc
	ch <- waitsDesc
	ch <- timeoutsDesc
	ch <- connsDesc
	ch <- freeConnsDesc
}

func (c *redisCollector) Collect(ch chan<- prometheus.Metric) {
	stats := c.client.PoolStats()
	ch <- prometheus.MustNewConstMetric(requestsDesc, prometheus.CounterValue, float64(stats.Requests))
	ch <- prometheus.MustNewConstMetric(hitsDesc, prometheus.CounterValue, float64(stats.Hits))
	ch <- prometheus.MustNewConstMetric(waitsDesc, prometheus.CounterValue, float64(stats.Waits))
	ch <- prometheus.MustNewConstMetric(timeoutsDesc, prometheus.CounterValue, float64(stats.Timeouts))
	ch <- prometheus.MustNewConstMetric(connsDesc, prometheus.GaugeValue, float64(stats.TotalConns))
	ch <- prometheus.MustNewConstMetric(freeConnsDesc, prometheus.GaugeValue, float64(stats.FreeConns))
}

Every time Prometheus fetches the /metrics endpoint on our API the Collect function is called and sends the metrics to the provided channel.

Branch 06

Our handlers interact with the redis package by calling redis.Increment which writes to a channel. Another goroutine, in the redis package keeps listening for this channel and talks to the redis server everytime a request is made thru it.

It's usefull to instrument the size of the channel to understand if we need to increase it's buffer size.

prometheus.MustRegister(prometheus.NewGaugeFunc(prometheus.GaugeOpts{
    Name: "myapp_redis_queue_current_length",
    Help: "The current number of itens on redis queue.",
}, func() float64 {
    return float64(len(keys))
}))

This snippet registers a new metric, using one of the New*Func family of functions. This function (that returns the length of the keys channel), will be called everytime Prometheus scrapes our service. This instrument our channel and might be used to discover saturation and tune the buffer size.

Branch 07

Our call to the redis.Increment will block if the channel hits it's buffer size. We can instrument our channel to discover if we are indeed blocking (Saturation). This is done in this step.

We begin by introducing a new metric:

var (
	queueWaitDuration = prometheus.NewHistogram(prometheus.HistogramOpts{
		Name: "myapp_redis_queue_wait_duration_seconds",
		Help: "The wait duration when trying to write to the redis queue",
	})
)

And changing our redis.Increment implementation to:

func Increment(key string) {
	select {
	case keys <- key:
	default:
		now := time.Now()
		keys <- key
		queueWaitDuration.Observe(time.Since(now).Seconds())
	}
}

This means that, if there is no request being served (channel is empty), the first case statement is reached: keys <- key and we simply add the key to the channel and return. If the channel blocks, the default case is reached and we record the time we spend blocked.

This metric can be used to measure our saturation and to tune the channel buffer size for our requirements.

Branch 08

In this step we add some instrumentation to our redis backend to measure the number of operations we are applying.

We start by registering a new metric:

var (
	redisOps = prometheus.NewCounterVec(prometheus.CounterOpts{
		Name: "myapp_redis_worker_operations",
		Help: "The total operations performed by redis worker.",
	}, []string{"operation", "result"})
)

And then updating our background worker:

func startWorker(client *redis.Client) {
	fmt.Printf("Starting background worker...\n")
	for {
		opLabel := "ping"
		var err error
		select {
		case key := <-keys:
			result := client.Incr(key)
			opLabel = "incr"
			err = result.Err()
		case <-time.After(time.Second * 10):
			result := client.Ping()
			err = result.Err()
		}
		resultLabel := "ok"
		if err != nil {
			resultLabel = "error"
		}
		redisOps.WithLabelValues(opLabel, resultLabel).Inc()
	}
}

We use labels to distinguish the operation type and the result.

epguan/demo-go-instrument