mittwald/kube-httpcache

varnish-exporter resource consumption issues

timkante opened this issue ยท 19 comments

Describe the bug

We are running your latest image as a StatefulSet (without Helm), and manually rebuilt your master-state that includes the varnish-exporter.
We did the following steps to get there:

  • Regular deployment of kube-httpcache(see config below)
  • We pushed a docker image that bundles the varnish exporter, which is a 1:1 copy of your version - also see the Dockerfile below

Everything is working seemingly fine, but the memory and CPU consumption of the exporter-container is steadily rising.
We increased its resource limits several times with no success, after a couple of days it always reaches its limits and will stop working.
It also does not produce any suspicious logs.

CPU Consumption (we had a short metrics outage on the system, and the drop at the end is a restart of the STS):
image

Memory Consumption:
image

Expected behavior
A reasonably constant usage of resources.

Configuration

our StatefulSet-Config:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: varnish
  namespace: gatekeeper
  labels:
    app: varnish
spec:
  serviceName: varnish
  # replicas: <KUSTOMIZE>
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: varnish
  template:
    metadata:
      labels:
        app: varnish
    spec:
      imagePullSecrets:
        - name: staffbase-registry
      containers:
        - name: varnish
          image: quay.io/mittwald/kube-httpcache:v0.4.2
          imagePullPolicy: IfNotPresent
          args:
            - -admin-addr=0.0.0.0
            - -admin-port=6083
            - -signaller-enable
            - -signaller-port=8090
            - -frontend-watch
            - -frontend-namespace=$(NAMESPACE)
            - -frontend-service=varnish
            - -frontend-port=8080
            - -backend-watch
            - -backend-namespace=$(NAMESPACE)
            - -backend-service=frontend-cache-service
            - -varnish-secret-file=/etc/varnish/k8s-secret/secret
            - -varnish-vcl-template=/etc/varnish/tmpl/default.vcl.tmpl
            - -varnish-storage=malloc,256M
          ports:
            - containerPort: 8080
              name: http
            - containerPort: 8090
              name: signaller
          # resources:
          #   requests:
          #     memory: <KUSTOMIZE>
          #     cpu: <KUSTOMIZE>
          #   limits:
          #     memory: <KUSTOMIZE>
          #     cpu: <KUSTOMIZE>
          env:
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: CACHE_INFO_HEADERS
              # value: <KUSTOMIZE>
            - name: VSM_NOPID
              value: "true"
          readinessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 30
          volumeMounts:
            - name: template
              mountPath: /etc/varnish/tmpl
            - name: secret
              mountPath: /etc/varnish/k8s-secret
            - name: var
              mountPath: /var/lib/varnish
          securityContext:
            runAsUser: 100
            runAsGroup: 101
        - name: exporter
          securityContext:
            runAsUser: 100
            runAsGroup: 101
          image: registry.staffbase.com/private/prometheus_varnish_exporter:1.0.0
          imagePullPolicy: IfNotPresent
          args:
            - -web.health-path=/healthz
            - -no-exit
            - -verbose
          env:
            - name: VSM_NOPID
              value: "true"
          ports:
            - name: metrics
              containerPort: 9131
          resources:
            requests:
              memory: 128Mi
              cpu: 200m
            limits:
              memory: 256Mi
              cpu: 400m
          volumeMounts:
            - name: var
              mountPath: /var/lib/varnish
          readinessProbe:
            httpGet:
              path: /healthz
              port: metrics
            initialDelaySeconds: 30
      serviceAccountName: varnish
      restartPolicy: Always
      volumes:
        - name: template
          configMap:
            name: vcl-template
        - name: secret
          secret:
            secretName: varnish-secret
        - name: var
          emptyDir: { }

The exact docker image we build for the exporter:

FROM        debian:stretch-slim

ENV         EXPORTER_VERSION=1.6

WORKDIR     /
RUN         set -x \
            && \
            apt-get -qq update && apt-get -qq upgrade \
            && \
            apt-get -qq install \
                debian-archive-keyring \
                curl \
                gnupg \
                apt-transport-https \
            && \
            curl -Ss -L https://packagecloud.io/varnishcache/varnish60lts/gpgkey | apt-key add - \
            && \
            printf "%s\n%s" \
                "deb https://packagecloud.io/varnishcache/varnish60lts/debian/ stretch main" \
                "deb-src https://packagecloud.io/varnishcache/varnish60lts/debian/ stretch main" \
            > "/etc/apt/sources.list.d/varnishcache_varnish60lts.list" \
            && \
            apt-get -qq update && apt-get -qq install varnish \
            && \
            apt-get -qq purge curl gnupg apt-transport-https && \
            apt-get -qq autoremove && apt-get -qq autoclean && \
            rm -rf /var/cache/*

RUN         mkdir /exporter && \
            chown varnish /exporter

ADD         --chown=varnish https://github.com/jonnenauha/prometheus_varnish_exporter/releases/download/${EXPORTER_VERSION}/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz /tmp

RUN         cd /exporter && \
            tar -xzf /tmp/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz && \
            ln -sf /exporter/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64/prometheus_varnish_exporter prometheus_varnish_exporter

ENTRYPOINT  [ "/exporter/prometheus_varnish_exporter" ]

Kubernetes Version: 1.18.15

Thanks for looking into this ๐Ÿค ๐Ÿš€

There has not been any activity to this issue in the last 14 days. It will automatically be closed after 7 more days. Remove the stale label to prevent this.

Sorry for the late response. The varnish exporter is a third-party binary that TBH I'm not particularly familiar with.

From other prometheus exporters, I know that metrics churn (i.e. lots of new time series added over time) is often a problem and suspect a similar issue in this case (especially since backend names are updated/rotated frequently with most VCL configurations). Could you have a look at the metrics exported by the exporter (especially after it's been running a while and memory consumption has risen by a significant degree)?

@martin-helmich I did a quick check and noticed the following two things:
Two consecutive scrapes update from (one specific entry picked here):
varnish_backend_bereq_bodybytes{backend="k8s-upstreamcfg-1096.be-frontend-cache-service-68598f57cd-ksfl2",server="unknown"} 0
to:
varnish_backend_bereq_bodybytes{backend="k8s-upstreamcfg-1100.be-frontend-cache-service-68598f57cd-ksfl2",server="unknown"} 0

This already indicates that many labels get created by the exporter, as the backend names constantly update due to the config-reloader.

I also noticed that varnish_main_n_backend 6606 is way higher than it should be, so I backed that with a quick Grafana query, which shows the rise of "known backends":

image

So I guess the amount of labels could indeed be a problem here (not only for the exporter, but also for prometheus), why do you think?

Can we as a solution maybe change the names that the config-reloader puts in the config to not be unique on every config reload?

So I guess the amount of labels could indeed be a problem here (not only for the exporter, but also for prometheus), why do you think?

Yes -- memory consumption of the exporter will increase (approximately) linearly with the number of metrics. AFAIK, Prometheus itself is a bit more optimized for high metric churn than it used to be in the past.

Can we as a solution maybe change the names that the config-reloader puts in the config to not be unique on every config reload?

IIRC, the k8s-upstreamcfg-1100 part of k8s-upstreamcfg-1096.be-frontend-cache-service-68598f57cd-ksfl2 (from your example above) is the VCL config name, not the backend name (which is just the Pod name). When reloading the config, the name simply increments since this way the easiest way to assert that a "new" config is defined and loaded.

I'm just going to dump some thoughts that come into my mind:

A possible "solution" might be to re-use previous VCL config names (however, off the top of my head, I don't know if the vcl.inline command supports overwriting an existing VCL config); in theory, it might be enough to have one active and one inactive config that are updated and loaded alternatingly. -- Alternatively one might be able to convince the maintainer of the exporter itself to add a configuration option for not including the VCL config name at all in the backend label.

However, Pod names will also change frequently and still cause metrics churn, even if the VCL names stay constant. This might (!) be solved by naming VCL backends deterministically (maybe just using indices, instead of their names):

    {{ range $i, $b := .Backends }}
    backend be-{{ $i }} {
        .host = "{{ .Host }}";
        .port = "{{ .Port }}";
    }
    {{- end }}

I'll give this issue some more thought, but cannot make any promises as to when I'll find the time to actually put some work into this. In the meantime, any help is welcome. ๐Ÿ™‚

As another follow-up: is it normal that the config does get reloaded every other minute?

image

is it normal that the config does get reloaded every other minute?

From the looks of your logs, the VCL template is updated quite frequently, which seems to trigger a configuration reload. IIRC, template changes are determined by watching the respective file using inotify.

Can you check if there are actual changes being made to said file? Alternatively, it might be worth adding a check to the controller to only trigger a config reload if the actual contents of the VCL template changed.

@martin-helmich I assume that in the code, there should be a check for the kind of event fswatch is streaming.

The current implementation is: (https://github.com/mittwald/kube-httpcache/blob/master/pkg/watcher/template_watch.go#L17)

func (t *fsnotifyTemplateWatcher) watch(updates chan []byte, errors chan error) {
	for ev := range t.watcher.Events {
		glog.V(6).Infof("observed %s event on %s", ev.String(), ev.Name)

		content, err := ioutil.ReadFile(t.filename)
		if err != nil {
			glog.Warningf("error while reading file %s: %s", t.filename, err.Error())

			errors <- err
			continue
		}

		updates <- content
	}
}

So no matter what event, the file gets reloaded. This can cause issues if multiple processes access the file, and the access-time - for example - gets updated.

Maybe it would make sense to add a check like: (from the fsnotify docs at: https://github.com/fsnotify/fsnotify#usage)

if ev.Op&fsnotify.Write == fsnotify.Write {
    log.Println("modified file:", ev.Name)
}

For our case I switched to your poll implementation (by adding -varnish-vcl-template-poll as program argument), as this explicitly calls os.stat and only checks the modTime of the file and reloads if it changed. This results in the expected behaviour - only reloading the file if changed.

And regarding the metrics-churn, here is what I found out so far:

The prometheus_varnish_exporter has a special handling for the behaviour varnishreload has when reloading VCLs. It names the VCL reload_<TIMESTAMP>. Here is the function of the exporter, that discards the VCL name if it is a reloaded version: https://github.com/jonnenauha/prometheus_varnish_exporter/blob/86fc1b025dd41eaf43a583a262daec6cad69b561/prometheus.go#L219

However, to get this running, we cloned your repo and changed https://github.com/mittwald/kube-httpcache/blob/master/pkg/controller/watch.go#L80 to

	configname := fmt.Sprintf("reload_%d", time.Now().UnixNano())

This will then correctly be discarded by the exporter:

Before:

After:

In addition to that, defining a lot of inline VCLs, can be memory consuming, as varnish keeps the compiled VCLs and never discards them. You can inspect this by running varnishadm vcl.list - for us, it was:

So I changed the run command of the STS container to:

      containers:
        - name: varnish
          image: registry.staffbase.com/private/kube-httpcache:0.0.3
          imagePullPolicy: IfNotPresent
          command:
            - /bin/bash
            - -cxe
            - |
              set -eo pipefail

              while true; do
                varnishadm vcl.list \
                  | tr -s ' ' \
                  | cut -d ' ' -f 4 \
                  | head -n -6 \
                  | while read in; do varnishadm vcl.discard "$in" &>/dev/null; done
                sleep 60;
              done &

              ./kube-httpcache \
                -admin-addr=0.0.0.0 \
                -admin-port=6083 \
                -signaller-enable \
                -signaller-port=8090 \
                -frontend-watch \
                -frontend-namespace=$(NAMESPACE) \
                -frontend-service=varnish-headless \
                -frontend-port=8080 \
                -backend-watch \
                -backend-namespace=$(NAMESPACE) \
                -backend-service=frontend-cache-service \
                -varnish-secret-file=/etc/varnish/k8s-secret/secret \
                -varnish-vcl-template=/etc/varnish/tmpl/default.vcl.tmpl \
                -varnish-vcl-template-poll \
                -varnish-storage=malloc,256M
          ports:
            - containerPort: 8080
              name: http
            - containerPort: 8090
              name: signaller

Which launches a fork that discards every VCL, older than the most recent 5 VCLs.

Let's see if those steps already resolve our resource issue. ๐Ÿคž๐Ÿผ

Hey Tim,

thanks for your research in this matter -- that's tremendously helpful and much appreciated. ๐Ÿ‘

I think renaming the VCL configs to match the exporter's automatic cleanup mechanism should be a possible change to make without any nasty side effects. And maybe we can also adjust the controller to regularly discard old configurations... ๐Ÿค”

Awesome! I'm in to help you with the changes if you want ๐Ÿš€

Just some last concerns before we can start:

In case you still want to bundle the metrics-exporter in the docker image, we have to consider an issue that I came across: jonnenauha/prometheus_varnish_exporter#57

It has been fixed on their master (which is why I had to fork the repo and build an image for us by hand), so in case we make the change that the exporter will auto-discard the reload_-prefix, we will run into this case, that the last published release of the exporter will produce the errors described in the issue.

What is your suggested way for this? Extend the documentation of your service with a section on how to run it with the exporter as sidecar (and require a user to get their own image of its master), or build it from source in the dockerfile?

Or do you even have an alternative option in mind?

There has not been any activity to this issue in the last 14 days. It will automatically be closed after 7 more days. Remove the stale label to prevent this.

Hey everyone!

Since the helm option to include the exporter is publicly available, do you believe @martin-helmich it's possible to publish a new Docker image version that contains the exporter referenced in the Dockerfile? ๐Ÿ™

For now, enabling the exporter option results in the following error :

Warning  Failed     0s (x3 over 23s)   kubelet            Error: failed to create containerd task: OCI runtime create failed: container_linux.go:370: starting container process caused: exec: "/exporter/prometheus_varnish_exporter": stat /exporter/prometheus_varnish_exporter: no such file or directory: unknown

Indeed it's not in the image:

docker run --entrypoint=bash -it quay.io/mittwald/kube-httpcache:stable
root@c14a649e7891:/# ls /
bin  boot  dev	etc  home  kube-httpcache  lib	lib64  media  mnt  opt	proc  root  run  sbin  srv  sys  tmp  usr  var

Both the stable & v0.5.0 tags don't contain it (despite the release notes mentioning the addition)

I wouldn't recommend Publishing the current state of the Docker Image due to the issues I mentioned above. Those should be addressed before releasing the cache + exporter combo into the wild, as I'd not consider the current implementation as production-ready.

While I share your concerns in introducing a not ready feature in the stable version of this project, I tend to believe that if the content of the public Docker image is not as advertised in the release and on the master branch, it's even more disturbing to newcomers to this project ๐Ÿ˜ฌ

Following, your earlier message, I wonder what could be the best way to move forward on this?
I'm willing to put effort in this as well if necessary ๐Ÿ™‚

@thylong I think findig a soulution to the question I had above, would enable us moving forward here:

In case you still want to bundle the metrics-exporter in the docker image, we have to consider an issue that I came across: jonnenauha/prometheus_varnish_exporter#57

It has been fixed on their master (which is why I had to fork the repo and build an image for us by hand), so in case we make the change that the exporter will auto-discard the reload_-prefix, we will run into this case, that the last published release of the exporter will produce the errors described in the issue.

What is your suggested way for this? Extend the documentation of your service with a section on how to run it with the exporter as sidecar (and require a user to get their own image of its master), or build it from source in the dockerfile?

Dou you have some thoughts on this?

VCL reload naming fix

IMHO, we should use your suggest fix of line [#80](https://github.com/mittwald/kube-httpcache/blob/master/pkg/controller/watch.go#L80 to) :

configname := fmt.Sprintf("reload_%d", time.Now().UnixNano())

It keeps things simple, doesn't alter at all the current behavior of kube-httpcache, it just works like a charm.

VCL list increasing over time

I tend to agree with @martin-helmich on the discard.
Having the watcher discarding previous configurations progressively should be preferred to avoid running scripts in the background for that duty as we're in a containerized environment.

Here is a PR suggestion a fix : #95

Prometheus_varnish_exporter outdated

Do you even have an alternative option in mind?

I was about to suggest to reach out to the maintainer of prometheus_varnish_exporter to request a release that includes jonnenauha/prometheus_varnish_exporter#57but it seems that it has already been done a while ago with no success here... ๐Ÿ˜ž

Second alternative option would be to fork ourselves the original repo and publish a release until the maintainer do so.
While I'm not often fan of forks, this would avoid to degrade Docker building times and kube-httpcache maintainability

Preferred solutions

IMHO, a prometheus_varnish_exporter fork through mittwald account + a Github release with the fix would keep the governance coherent on this project and helps to keep this project simple.

If we don't pick this option, I tend to prefer building the exporter as it will ensure the exporter will work out of the box for people wanting to play with this project. It's a must have feature for production environments.

WDYT @timkante @martin-helmich ?

Hey @thylong, thanks for placing your thoughts on this โœŒ๐Ÿผ

I agree with both fixes which target this project's source (the config renaming and wiring the VCL removal into the application code).

Regarding the exporter-release-issue, I think we should make another attempt to reach out to the owners and get them to create a release. Maybe we can comment on the existing, open issue, or create a new issue with a bit more content from scratch. ๐Ÿ’ฌ

I think we should give this another shot, as the owner of the exporter seems to still be active (looking at his GitHub history).

If there is no reaction I'd agree with you, preferring the mittwald exporter Fork - backed by a from-source build if the owners don't like this idea ๐Ÿ’ฏ

Hey all! ๐Ÿ‘‹ Thanks for all your thoughts on this issue -- it took me a while to catch up on this one. ๐Ÿ˜‰

Regarding the VCL naming fix (using fmt.Sprintf("reload_%d", time.Now().UnixNano()) as VCL name)

Full ACK from me. Happy to accept PRs on this, in case anyone wants full attribution in the commit log, otherwise I might also pick up the proposed change from the comments myself.

Regarding the VCL discarding

Thanks for the PR, @thylong! I'll get on reviewing just after submitting this comment. ๐Ÿ‘

Regarding the excessive VCL reloading mentioned in #81 (comment)

I was happy to note that this was already fixed by another contributor in #90. ๐Ÿฅณ

Regarding the exporter release

If at all possible, I'd prefer not maintaining our own fork of a 3rd-party-repo. ๐Ÿ˜• Maybe we can take another shot at pining the varnish-exporter maintainer. As a last resort, I'll look into forking the respective repo.

Regarding the missing exporter binary

I noticed that the "regular" Dockerfile contains the necessary steps for obtaining the exporter binary:

ADD --chown=varnish https://github.com/jonnenauha/prometheus_varnish_exporter/releases/download/${EXPORTER_VERSION}/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz /tmp
RUN cd /exporter && \
tar -xzf /tmp/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64.tar.gz && \
ln -sf /exporter/prometheus_varnish_exporter-${EXPORTER_VERSION}.linux-amd64/prometheus_varnish_exporter prometheus_varnish_exporter

It's only the Dockerfile used for Goreleaser that's missing the exporter binary (I'm going to have to refresh my memory on why we're using two different Dockerfile in the first place ๐Ÿค”). I'll create I have created #96 as a follow-up issue for adjusting the Dockerfile accordingly (or better, see if we can get away with using a single Dockerfile instead of two separate ones).