influxdata/telegraf

docker.sock access error

Closed this issue ยท 13 comments

Relevent telegraf.conf

[[inputs.docker]]
   endpoint = "unix:///var/run/docker.sock"
   container_names = []
   timeout = "5s"
   perdevice = false
   perdevice_include = ["cpu"]
   total = true
   total_include = ["cpu"]

System info

Telegraf 1.20.3, Debian 11

Docker

telegraf:
image: telegraf:latest
container_name: telegraf
depends_on:
- influxdb
volumes:
- /var/run/docker.sock:/var/run/docker.sock

Steps to reproduce

  1. start docker container
  2. see error log

...

Expected behavior

no errors expected.

Actual behavior

2021-11-03T12:40:00Z E! [inputs.docker] Error in plugin: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.21/info": dial unix /var/run/docker.sock: connect: permission denied,
2021-11-03T12:40:00Z E! [inputs.docker] Error in plugin: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.21/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D&limit=0": dial unix /var/run/docker.sock: connect: permission denied

Additional info

My setup worked perfectly till the update form 29. Oktober.
All other services (portainer, traefik, watchtower) have no issue with accessing docker.sock.
I don't have a root user. I use sudo. Telegraf is started by docker-compose up -d in "sudo su" mode.

Same as #10031 ?

Hi,

We recently made a change to our Telegraf container images to run the telegraf process as the telegraf user and group. This means that when running Telegraf in a container, you will need to make sure that the telegraf user is added to the docker group.

The docker user directive can accomplish this by adding:

--user telegraf:$(stat -c '%g' /var/run/docker.sock)

We have a full post on why we made this change and the impact to users.

Thanks!

pbek commented

--user telegraf:$(stat -c '%g' /var/run/docker.sock)

Of course if telegraf is used with docker-compose you either need to set the group statically by executing stat -c '%g' /var/run/docker.sock on the command line on the host and using that group id with the user property in your docker-compose.yaml, like:

  telegraf:
...
    # "1000" is the group id of the docker daemon, run: $(stat -c '%g' /var/run/docker.sock)
    # see: https://www.influxdata.com/blog/docker-run-telegraf-as-non-root/
    user: telegraf:1000
...

Or you can play around with docker/compose#1532 (comment) to get an env variable with the group id into the docker-compose.yaml.

Run this command then restart telegraf container:

docker exec -it TELEGRAF_CONTAINER_NAME /bin/bash chmod 666 /var/run/docker.sock

PS: am running telegraf:1.21 docker image.

Run this command then restart telegraf container:

docker exec -it TELEGRAF_CONTAINER_NAME /bin/bash chmod 666 /var/run/docker.sock

PS: am running telegraf:1.21 docker image.

Issue with this is if you update telegraf (though at this point i'm keeping it at the same version for the forseeable future).

sigh I'm running telegraf as an edge stack through Portainer on all docker hosts in my home (-lab). And the docker GID is different on every host. Is there any option to get the old behavior back? Otherwise, it literally breaks my whole home host monitoring.

sigh I'm running telegraf as an edge stack through Portainer on all docker hosts in my home (-lab). And the docker GID is different on every host. Is there any option to get the old behavior back? Otherwise, it literally breaks my whole home host monitoring.

I'm on the same boat. @sgofferj did you find a workaround?

kykc commented

@sgofferj @rodrigogonegit

If you're using docker-compose you can utilize .env file to have local environment on each host. Example:

Besides my docker-compose.yml I have a following script:

#!/bin/bash

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

DOCKER_GID=`stat -c '%g' /var/run/docker.sock`
REPORT_HOSTNAME=`hostname`

echo "DOCKER_GID=${DOCKER_GID}" > ${SCRIPT_DIR}/.env
echo "REPORT_HOSTNAME=${REPORT_HOSTNAME}" >> ${SCRIPT_DIR}/.env

It creates .env file like this

DOCKER_GID=998
REPORT_HOSTNAME=halo

Then, in your docker-compose.yml you can have

  hostname: ${REPORT_HOSTNAME}
  user: "telegraf:${DOCKER_GID}"

You may also want to add .env to .gitignore if you are versioning your docker-compose.yml and all other stuff besides it in git.

@kykc
That's unfortunately not how portainer edge stacks work.

I just ran into this issue. my solution for docker-compose

services:
  telegraf:
    image: telegraf
    container_name: telegraf
    entrypoint: /bin/bash -c "chmod 666 /var/run/docker.sock && /entrypoint.sh telegraf"
    volumes:
        - './telegraf/telegraf.conf:/etc/telegraf/telegraf.conf'
        - '/var/run/docker.sock:/var/run/docker.sock'
/bin/bash -c "chmod 666 /var/run/docker.sock && /entrypoint.sh telegraf"

I had to replace bash with sh:
entrypoint: /bin/sh -c "chmod 666 /var/run/docker.sock && /entrypoint.sh telegraf"

I had to replace bash with sh: entrypoint: /bin/sh -c "chmod 666 /var/run/docker.sock && /entrypoint.sh telegraf"

That worked for me! Thanks

@kykc That's unfortunately not how portainer edge stacks work.

What he said does work on Portainer if you do the following:

Open your terminal (for me, I'm on UnRaid). Get the results to the following commands:

'stat -c '%g' /var/run/docker.sock' (will give you your DOCKER_GID)
hostname (will, as the name suggests, give you your hostname).

Then edit your stack in Portainer. Add two new environmental values:

name: DOCKER_GID value: output from the first command.
name: REPORT_HOSTNAME value: output from second command.

Then in your compose for Telegraf make sure you include the following:

user: 'telegraf:${DOCKER_GID}'
hostname: ${REPORT_HOSTNAME}
env_file:
- stack.env (with correct yaml spacing (starting from the v of env))

(In Portainer it is called stack.env)

I'm late to this topic because I have run into this issue myself tonight, and those instructions fixed it.