pi-hole/docker-pi-hole

Docker on Synology keeps stopping with Failed to set capabilities for pihole-FTL. Cannot run as non-root.

imro2 opened this issue ยท 117 comments

imro2 commented

This is a: Run Issue (running Pi-hole container failing),

Details

After watchtower pulled latest image, pi-hole docker will not start.

Related Issues

  • I have searched this repository/Pi-hole forums for existing issues and pull requests that look similar

How to reproduce the issue

  1. Environment data
  • Operating System: Linux Synology 4.4.180+ GNU/Linux synology_geminilake_920+
  • Hardware: Synology DS920+
  • Kernel Architecture: x86_64
  • Docker Install Info and version:
    • Software source: official docker
    • Supplimentary Software: synology
  • Hardware architecture: amd64
  1. docker-compose.yml contents, docker run shell command, or paste a screenshot of any UI based configuration of containers here
version: "2"
services:
  pihole:
    container_name: pihole
    domainname: docker
    hostname: pihole
    image: pihole/pihole:latest
    ports:
      - '53:53/tcp'
      - '53:53/udp'
    expose:
      - 80
      - 443
    networks:
      - proxied
    restart: unless-stopped
    volumes:
      - ${BASEDIR}/pihole:/etc/pihole
      - ${BASEDIR}/pihole.log:/var/log/pihole.log
      - ${BASEDIR}/dnsmasq.d:/etc/dnsmasq.d
    environment:
      - ServerIP=${SERVER_IP}
      - PROXY_LOCATION=pihole
      - VIRTUAL_HOST=pihole.${DOMAINNAME}
      - VIRTUAL_PORT=80
      - TZ=${TZ}
      - DNSMASQ_LISTENING=all
      - WEBPASSWORD=${WEBUIPASS}
      - DNS1=8.8.8.8
      - DNS2=1.1.1.1
      - DNSMASQ_USER:pihole
    dns:
      - 1.1.1.1
      - 1.0.0.1
    labels:
      - "traefik.enable=true"
      - "traefik.backend=pihole"
      ...
  1. any additional info to help reproduce
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] 01-resolver-resolv: applying...
[fix-attrs.d] 01-resolver-resolv: exited 1.
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 20-start.sh: executing...
 ::: Starting docker specific checks & setup for docker pihole/pihole
Failed to set capabilities on file `/usr/bin/pihole-FTL' (Operation not supported)
The value of the capability argument is not permitted for a file. Or the file is not a regular (non-symlink) file
ERROR: Failed to set capabilities for pihole-FTL. Cannot run as non-root.
[cont-init.d] 20-start.sh: exited 1.
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.

These common fixes didn't work for my issue

  • I have tried removing/destroying my container, and re-creating a new container
  • I have tried fresh volume data by backing up and moving/removing the old volume data
  • I have tried running the stock docker run example(s) in the readme (removing any customizations I added)
  • I have tried a newer or older version of Docker Pi-hole (depending what version the issue started in for me)
  • I have tried running without my volume data mounts to eliminate volumes as the cause

Dropping back to version 2021.12.1 resolved the issue

I have the same issue since upgrading to 2022.01
Rolled back to 2021.12.1 and it starts correctly.

Same issue here,
I fixed this by adding DNSMASQ_USER=root to my docker-compose.yml file.
The default for this was changed from root to pihole

Enn0 commented

the "DNSMASQ_USER=root" fixed it for me.

Does it make a difference if you add the capabilities to the container (without setting the DNSMASQ_USER to root?

https://github.com/pi-hole/docker-pi-hole#note-on-capabilities

for compose yml add:

 cap_add:
      - NET_ADMIN
      - SYS_NICE
      - CHOWN

Does it make a difference if you add the capabilities to the container (without setting the DNSMASQ_USER to root?

https://github.com/pi-hole/docker-pi-hole#note-on-capabilities

for compose yml add:

 cap_add:
      - NET_ADMIN
      - SYS_NICE
      - CHOWN

I tried removing the DNSMASQ_USER=root and adding the above cap_add, but then the 2022.1 image again does not start.

Interesting to know, thanks . I wonder what is different there... I have a synology NAS here that I can try to spin something up on later to play... DSM7 and docker installed via the package manager?

I'm on DSM 6.2.4-25556 Update 2 (which is the latest DSM 6 version)
Docker installed via package manager

Enn0 commented

I'm on DSM 6.2.4-25556 Update 2 (which is the latest DSM 6 version) Docker installed via package manager

Same for me.

This might be different for me, as I am on DSM7... don't currently have a way of testing DSM6

but with a very basic compose file of:

version: "3"

# More info at https://github.com/pi-hole/docker-pi-hole/ and https://docs.pi-hole.net/
services:
  pihole:
    container_name: pihole
    image: pihole/pihole
    volumes:
      - './etc-pihole/:/etc/pihole/'
      - './etc-dnsmasq.d/:/etc/dnsmasq.d/'
    # Recommended but not required (DHCP needs NET_ADMIN)
    #   https://github.com/pi-hole/docker-pi-hole#note-on-capabilities
    # cap_add:
    #   - NET_ADMIN
    #   - SYS_NICE
    #   - CHOWN
    restart: unless-stopped

It starts, with or without the cap_add section

image

For me, 2022.01 is failing at

dnsmasq: cannot access directory /etc/dnsmasq.d: Permission denied
::: Testing pihole-FTL DNS: [cont-init.d] 20-start.sh: exited 1.
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.

I did not find any chown at bash_functions.sh for this directory and the user pihole ?

I'm using bindmounts not volumes. OS Gentoo with Docker Version: 20.10.9 amd64

dnsmasq.d directory has 0700 while normally it should have 0755, strange. (Owned by root:root)

For me the the startup failure is caused by the fix_capabilities function in bash_functions.sh
I tried running the code manual from a bash shell in the docker container:

root@pihole:~# setcap CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_NET_ADMIN,CAP_SYS_NICE,CAP_CHOWN+ei /usr/bin/pihole-FTL 
Failed to set capabilities on file `/usr/bin/pihole-FTL' (Operation not supported)
The value of the capability argument is not permitted for a file. Or the file is not a regular (non-symlink) file

What does your compose file look like?

version: '3'
services:
  pihole:
    image: pihole/pihole:latest
    hostname: pihole
    domainname: redacted
    networks:
      macvlan_network:
        ipv4_address: xxx.xxx.xxx.xxx
    ports:
      - 53/tcp
      - 53/udp
      - 67/udp
      - 80/tcp
      - 443/tcp
    environment:
      - DNS1=1.1.1.1
      - DNS2=1.0.0.1
      - WEBPASSWORD=redacted
      - ServerIP=xxx.xxx.xxx.xxx
      - VIRTUAL_HOST=pihole.redacted
      - TZ=Europe/Amsterdam
      - DNSMASQ_USER=root
    volumes:
      - /volume2/docker/pihole/dnsmasq.d:/etc/dnsmasq.d
      - /volume2/docker/pihole/pihole:/etc/pihole

networks:
  macvlan_network:
    external: true

the above compose file works with 2022.01
when the DNSMASQ_USER=root line is removed it does not work with 2022.01

In both cases, the setcap error is logged, but when DNSMASQ_USER=root, the error is ignored by the script.

Can you try setting each cap individually to see if it is a particular one that is throwing the error?

e.g:

setcap CAP_NET_BIND_SERVICE+ei /usr/bin/pihole-FTL 
setcap CAP_NET_RAW+ei /usr/bin/pihole-FTL 
setcap CAP_NET_ADMIN+ei /usr/bin/pihole-FTL 
setcap CAP_SYS_NICE+ei /usr/bin/pihole-FTL 
setcap CAP_CHOWN+ei /usr/bin/pihole-FTL 
setcap CAP_IPC_LOCK+ei /usr/bin/pihole-FTL 

I suspect it's one of the latter three, as they were added in the latest version https://github.com/pi-hole/docker-pi-hole/blob/master/bash_functions.sh#L6

(although I thought I had subsequently removed the CAP_IPC_LOCK check, as it is not required by FTL anyway)

@grmbl99 I see you mapped folders on synology. What permissions do the folders have? I had to add [everybody] because Pihole could not save the settings. Would you mind checking for me?

they all fail with the same error:

Failed to set capabilities on file `/usr/bin/pihole-FTL' (Operation not supported)
The value of the capability argument is not permitted for a file. Or the file is not a regular (non-symlink) file

@grmbl99 I see you mapped folders on synology. What permissions do the folders have? I had to add [everybody] because Pihole could not save the settings. Would you mind checking for me?

I wonder if that is relevant, as the setcap error is on an executable which is not in any of the mapped folders.

At this stage I have to assume this is a docker on DSM6 thing, because I am not seeing this same error on docker on DSM7.

I suppose the workaround is that DNSMASQ_USER is set to root as it used to be (though ideally pihole-FTL should not be running as root - hence the change) Do you see the setcap error on previous versions of the container?

as the setcap error is on an executable which is not in any of the mapped folders.

Yeah, I had wondered if you were doing some odd mapping of /usr/bin, which was what prompted me to ask for your compose file.

Don't suppose you feel like being brave and upping to DSM7 to see if you still see the issue? ๐Ÿ˜‰

Side Note: I actually ended up moving all of my docker containers off of my Synology and onto a Rpi4 as I always felt that docker on Synology was a bit... odd.

Do you see the setcap error on previous versions of the container?

Yep, they were there (at least with the 2021.12.1 version); but it did not block the starting of the container

Don't suppose you feel like being brave and upping to DSM7 to see if you still see the issue?

Nope, no proper USB support (yet) :-)

imro2 commented

@PromoFaux

At this stage I have to assume this is a docker on DSM6 thing, because I am not seeing this same error on docker on DSM7.

For what it is worth, I have the issue on DSM7, though I have not have time to test anything else but going back a version. I will try brand new Docker with caps set later on today.

@grmbl99 can you add FTLCONF_DEBUG_CAPS=true to your environment, and then look for some lines in /var/log/pihole-FTL.log that look like this:

image

@imro2, thanks for confirming you're seeing it also on DSM7 - odd to note that I am not!

Sorry, and same thing again with the three mentioned caps explicitly set in the compose file (NET_ADMIN, SYS_NICE, IPC_LOCK)

Screenshot 2022-01-04 at 15 08 01

(still running as DNSMASQ_USER=root)

Interesting. Thanks. So that shows that with the caps explicitly set, FTL is able to grab the caps it needs... so the question remains - why does setcap throw that error on container start?

As an experiment, maybe make a copy of start.sh on your host filesystem, comment out the call to fix_capabilities, and then bind mount it to the container? This should then skip the check that causes everyhing to fall over if DNSMASQ_USER is pihole instead of root (needless to say, this experiment should be done with DNSMASQ_USER=root)

@PromoFaux I'm assuming this is because Docker is not adding the CAP_SETFCAP capabiliity (required to use setcap) in some instances. I haven't tested it out yet but adding the SETFCAP capability to the docker-compose file might fix things.

imro2 commented

@rpthms

I haven't tested it out yet but adding the SETFCAP capability to the docker-compose file might fix things.

does not seem to work for me

@rpthms

I haven't tested it out yet but adding the SETFCAP capability to the docker-compose file might fix things.

does not seem to work for me

same here.

Yeah, was just a guess. Must be something else then.

Interesting. Thanks. So that shows that with the caps explicitly set, FTL is able to grab the caps it needs... so the question remains - why does setcap throw that error on container start?

As an experiment, maybe make a copy of start.sh on your host filesystem, comment out the call to fix_capabilities, and then bind mount it to the container? This should then skip the check that causes everyhing to fall over if DNSMASQ_USER is pihole instead of root (needless to say, this experiment should be done with DNSMASQ_USER=root)

I'm confused, setting DNSMASQ_USER=root effectively already ignores the setcap errors, which is ok as it is running as root anyway.

whoops, my bad, I meant it should be DNSMASQ_USER=pihole!

imro2 commented

@PromoFaux Here is my attempt with modified start.sh:

pihole-test | [s6-init] making user provided files available at /var/run/s6/etc...exited 0.
pihole-test | [s6-init] ensuring user provided files have correct perms...exited 0.
pihole-test | [fix-attrs.d] applying ownership & permissions fixes...
pihole-test | [fix-attrs.d] 01-resolver-resolv: applying...
pihole-test | [fix-attrs.d] 01-resolver-resolv: exited 1.
pihole-test | [fix-attrs.d] done.
pihole-test | [cont-init.d] executing container initialization scripts...
pihole-test | [cont-init.d] 20-start.sh: executing...
pihole-test |  ::: Starting docker specific checks & setup for docker pihole/pihole
pihole-test | Assigning random password: PCB4bcLz
pihole-test |
pihole-test |   [i] Installing configs from /etc/.pihole...
pihole-test |   [i] Existing dnsmasq.conf found... it is not a Pi-hole file, leaving alone!
  [โœ“] Installed /etc/dnsmasq.d/01-pihole.conf
  [โœ“] Installed /etc/dnsmasq.d/06-rfc6761.conf
pihole-test | Applying pihole-FTL.conf setting DEBUG_CAPS=true
pihole-test | Existing DNS servers detected in setupVars.conf. Leaving them alone
pihole-test | ::: Pre existing WEBPASSWORD found
pihole-test | DNSMasq binding to default interface: eth0
pihole-test | Added ENV to php:
pihole-test |                   "PIHOLE_DOCKER_TAG" => "2022.01",
pihole-test |                   "PHP_ERROR_LOG" => "/var/log/lighttpd/error.log",
pihole-test |                   "ServerIP" => "0.0.0.0",
pihole-test |                   "CORS_HOSTS" => "",
pihole-test |                   "VIRTUAL_HOST" => "0.0.0.0",
pihole-test | Using IPv4 and IPv6
pihole-test | ::: Preexisting ad list /etc/pihole/adlists.list detected ((exiting setup_blocklists early))
pihole-test | https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
pihole-test | ::: Testing pihole-FTL DNS:
pihole-test | dnsmasq: failed to create listening socket for port 53: Permission denied
pihole-test | [cont-init.d] 20-start.sh: exited 1.
pihole-test | [cont-finish.d] executing container finish scripts...
pihole-test | [cont-finish.d] done.
pihole-test | [s6-finish] waiting for services.
pihole-test | [s6-finish] sending all processes the TERM signal.
pihole-test | [s6-finish] sending all processes the KILL signal and exiting.
pihole-test exited with code 1

Have you explicitly set the required caps in the compose file?

imro2 commented

This is my compose file:


version: "3"

# More info at https://github.com/pi-hole/docker-pi-hole/ and https://docs.pi-hole.net/
services:
  pihole:
    container_name: pihole-test
    image: pihole/pihole:latest
    #ports:
    #  - "53:53/tcp"
    #  - "53:53/udp"
    #  - "67:67/udp"
    #  - "80:80/tcp"
    environment:
      TZ: 'America/Chicago'
      FTLCONF_DEBUG_CAPS: ${boolean_true}
      # DNSMASQ_USER: root
      # WEBPASSWORD: 'set a secure password here or it will be random'
    # Volumes store your data between container upgrades
    volumes:
      - './etc-pihole:/etc/pihole'
      - './etc-dnsmasq.d:/etc/dnsmasq.d'
      - './start.sh:/start.sh'
    # Recommended but not required (DHCP needs NET_ADMIN)
    #   https://github.com/pi-hole/docker-pi-hole#note-on-capabilities
    cap_add:
      - NET_ADMIN
      - SYS_NICE
      - IPC_LOCK
      - SETFCAP
imro2 commented

This is my pihole-FTL.log when running with DNSMASQ_USER=pihole and above docker-compose

[2022-01-04 09:08:24.674 318M] ***************************************
[2022-01-04 09:08:24.674 318M] * Linux capability debugging enabled  *
[2022-01-04 09:08:24.674 318M] * CAP_CHOWN                (00) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_DAC_OVERRIDE         (01) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_DAC_READ_SEARCH      (02) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_FOWNER               (03) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_FSETID               (04) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_KILL                 (05) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_SETGID               (06) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_SETUID               (07) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_SETPCAP              (08) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_LINUX_IMMUTABLE      (09) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_NET_BIND_SERVICE     (10) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_NET_BROADCAST        (11) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_NET_ADMIN            (12) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_NET_RAW              (13) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_IPC_LOCK             (14) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_IPC_OWNER            (15) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_MODULE           (16) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_RAWIO            (17) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_CHROOT           (18) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_PTRACE           (19) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_PACCT            (20) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_ADMIN            (21) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_BOOT             (22) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_NICE             (23) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_RESOURCE         (24) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_TIME             (25) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SYS_TTY_CONFIG       (26) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_MKNOD                (27) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_LEASE                (28) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_AUDIT_WRITE          (29) = -I- *
[2022-01-04 09:08:24.674 318M] * CAP_AUDIT_CONTROL        (30) = --- *
[2022-01-04 09:08:24.674 318M] * CAP_SETFCAP              (31) = -I- *
[2022-01-04 09:08:24.674 318M] ***************************************
[2022-01-04 09:08:24.674 318M] WARNING: Required Linux capability CAP_NET_ADMIN not available
[2022-01-04 09:08:24.674 318M] WARNING: Required Linux capability CAP_NET_RAW not available
[2022-01-04 09:08:24.674 318M] WARNING: Required Linux capability CAP_NET_BIND_SERVICE not available
[2022-01-04 09:08:24.674 318M] WARNING: Required Linux capability CAP_SYS_NICE not available
[2022-01-04 09:08:24.675 318M] WARNING: Required Linux capability CAP_IPC_LOCK not available
[2022-01-04 09:08:24.675 318M] WARNING: Required Linux capability CAP_CHOWN not available

Someone has just pointed me at this... https://serverfault.com/a/874053.

What storage driver are people's docker systems using? docker system info @imro2 @grmbl99

Someone has just pointed me at this... https://serverfault.com/a/874053.

What storage driver are people's docker systems using? docker system info @imro2 @grmbl99

Storage Driver: aufs
  Root Dir: /volume2/@docker/aufs
  Backing Filesystem: extfs
  Dirs: 984
  Dirperm1 Supported: true

And therein lies the answer. aufs (my synology is overlay2 - so it works)

This might be of use: https://meta.discourse.org/t/how-to-change-storage-backend-in-docker/75352

Probably better to read this though: moby/moby#30557

imro2 commented

And therein lies the answer. aufs (my synology is overlay2 - so it works)

This might be of use: https://meta.discourse.org/t/how-to-change-storage-backend-in-docker/75352

Can I change the storage to overlay2 even though I am using BTRFS? Why is yours set to overlay2? Was this factory? Are you using BTRFS for your Synology volumes?

... Turns out my storage driver is btrfs on the syno, I was SSH'd into a different machine when I looked before ๐Ÿ™ˆ

Complete docker system info from my syno:

admin@fappotron:~$ docker system info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 46
 Server Version: 20.10.3
 Storage Driver: btrfs
  Build Version: Btrfs v4.0
  Library Version: 101
 Logging Driver: db
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs db fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ea3508454ff2268c32720eb4d2fc9816d6f75f88
 runc version: 31cc25f16f5eba4d0f53e35374532873744f4b31
 init version: ed96d00 (expected: de40ad0)
 Security Options:
  apparmor
 Kernel Version: 4.4.180+
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 9.576GiB
 Name: fappotron
 ID: QGEQ:65ER:5KYD:A5PW:IQO2:HED3:P7S4:SQWT:YS6O:GLSS:IECP:TLLD
 Docker Root Dir: /volume1/@docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No kernel memory TCP limit support
WARNING: No cpu cfs quota support
WARNING: No cpu cfs period support
WARNING: No blkio weight support
WARNING: No blkio weight_device support
WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

But mine IS allowing setcap to run, so ... back to the drawing board perhaps...

Not sure if this is of any use:

image

imro2 commented

But mine IS allowing setcap to run, so ... back to the drawing board perhaps...

my storage driver is ausf so I think you are still onto something. I was just asking whether I can change the storage Docker storage driver to something else, even though my Synology is using BTRFS for the actual storage. It is just my not understanding what the Docker storage driver does.


 Storage Driver: aufs
  Root Dir: /volume1/@docker/aufs
  Backing Filesystem: extfs
  Dirs: 625
  Dirperm1 Supported: true

Looking at the docker documentation:
https://docs.docker.com/storage/storagedriver/select-storage-driver/

It seems that overlay2 is only supported with ext4 or xfs as 'backing' filesystem.

I'm no expert, but reading from the official docker documentation may shed some light on this: https://docs.docker.com/storage/storagedriver/select-storage-driver/

btrfs backing file system requires the btrfs storage driver and neither can be used with anything else. However, btrfs says it "allows for advanced options" but doesn't exactly list what (maybe the additional caps?).

aufs will be deprecated in future docker releases and it is recommended users migrate to overlay2

migrating is a simple as modifying the /etc/docker/daemon.json (see the link below for the correct location and name of the file) file BUT doing so will make any existing image and container inaccessible and broken, so you have to back them up and recreate them after restarting the service with the new storage driver.

EDIT: See the comments here regarding editing the storage driver on synology. Seems as if the docker package or synology or both may not properly support overlay2 which is why the default is aufs except for in cases where the docker package is installed on a BTRFS volume, then the default is btrfs. https://forums.docker.com/t/overlay2-driver-problems/89374

@imro2 Does it work for you when you set DNSMASQ_USER to root

What do the capability debug lines look like in /var/log/pihole-FTL.log when you set DNSMASQ_USER to root? Are capabilities magically assigned, despite aufs, as in @grmbl99's case? (I'm not sure if FTL does something different to a standard setcap or not)

imro2 commented

@PromoFaux yes it works with DNSMASQ_USER: root.

[2022-01-04 11:29:57.559 319M] ***************************************
[2022-01-04 11:29:57.559 319M] * Linux capability debugging enabled  *
[2022-01-04 11:29:57.559 319M] * CAP_CHOWN                (00) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_DAC_OVERRIDE         (01) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_DAC_READ_SEARCH      (02) = --- *
[2022-01-04 11:29:57.559 319M] * CAP_FOWNER               (03) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_FSETID               (04) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_KILL                 (05) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_SETGID               (06) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_SETUID               (07) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_SETPCAP              (08) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_LINUX_IMMUTABLE      (09) = --- *
[2022-01-04 11:29:57.559 319M] * CAP_NET_BIND_SERVICE     (10) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_NET_BROADCAST        (11) = --- *
[2022-01-04 11:29:57.559 319M] * CAP_NET_ADMIN            (12) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_NET_RAW              (13) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_IPC_LOCK             (14) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_IPC_OWNER            (15) = --- *
[2022-01-04 11:29:57.559 319M] * CAP_SYS_MODULE           (16) = --- *
[2022-01-04 11:29:57.559 319M] * CAP_SYS_RAWIO            (17) = --- *
[2022-01-04 11:29:57.559 319M] * CAP_SYS_CHROOT           (18) = PIE *
[2022-01-04 11:29:57.559 319M] * CAP_SYS_PTRACE           (19) = --- *
[2022-01-04 11:29:57.559 319M] * CAP_SYS_PACCT            (20) = --- *
[2022-01-04 11:29:57.559 319M] * CAP_SYS_ADMIN            (21) = --- *
[2022-01-04 11:29:57.560 319M] * CAP_SYS_BOOT             (22) = --- *
[2022-01-04 11:29:57.560 319M] * CAP_SYS_NICE             (23) = PIE *
[2022-01-04 11:29:57.560 319M] * CAP_SYS_RESOURCE         (24) = --- *
[2022-01-04 11:29:57.560 319M] * CAP_SYS_TIME             (25) = --- *
[2022-01-04 11:29:57.560 319M] * CAP_SYS_TTY_CONFIG       (26) = --- *
[2022-01-04 11:29:57.560 319M] * CAP_MKNOD                (27) = PIE *
[2022-01-04 11:29:57.560 319M] * CAP_LEASE                (28) = --- *
[2022-01-04 11:29:57.560 319M] * CAP_AUDIT_WRITE          (29) = PIE *
[2022-01-04 11:29:57.560 319M] * CAP_AUDIT_CONTROL        (30) = --- *
[2022-01-04 11:29:57.560 319M] * CAP_SETFCAP              (31) = PIE *
[2022-01-04 11:29:57.560 319M] ***************************************

I think this might be an issue with "early" adopters of Docker on Synology. I have started using it back in 2018 and kept upgrading the same system and even migrated the drives to new one, it might have latched on aufs. I will try to upgrade that tonight to btrfs and let you know what I find out.

OK. So, the TL;DR of this thread is:

  • Issues with synology? set DNSMASQ_USER to root.

I'll drop a note on the Readme and maybe make the error message a bit more verbose

Also seeing this issue on Debian/testing. Setting DNSMASQ_USER to root also fixed it for me. So not 100% a Synology issue.

Docker version 20.10.11+dfsg1, build dea9396 running on a btrfs partition.

Sure thing, the Readme will list Synology as an example - but the error message will provide a more generic "If you are seeing this error, please set the environment variable DNSMASQ_USER=root"

Out of interest, what is the Storage Driver: displayed in docker system info? Is btrfs standard for Debian/testing, or something you have set yourself?

Storage driver is overlay2. Looks like the docker images are sitting on one of my ext4 partitions rather than the btrfs pool, but I have the config data volumes mounted to the btrfs pool. I think Debian still ships with ext4, the btrfs pool was something I setup myself.

lsblk -f /dev/md2
NAME FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
md2  ext4   1.0         040eba36-fe7e-4ed1-8bbe-d32fe98b5175     19G    34% /

lsblk -f /dev/sdb5
NAME FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sdb5 btrfs        home  1f5c392a-c894-49fa-b36a-cde73b795703    6.1T    35% /home

ls -l /config
lrwxrwxrwx 1 root root 12 Oct 25 06:15 /config -> /home/config

Hmm, then unless I've completely misunderstood the problem, you shouldn't be seeing this issue... But, if the workaround works... :)

Well if you feel like poking at it further and need any other specific config or system info let me know, but I agree, if the workaround works...Save the time and energy for other issues :)

DL6ER commented

@Kline- Do you have anything special (read as: custom) on top of the btrfs pool? Maybe something that could limit the abilities of containers to set/add capabilities to executables within?

No, it's just a pool set to single mode data and raid1 system/metadata. No subvolumes, snapshots, or any of the other fancy features. My ext4 volumes (that the container images are on) are on top of a md raid array, but that's nothing too special.

Earlier I spun the container back up with the capabilities debug set out of curiosity. pihole-FTL.log shows FTL keeps starting then dying but here was the dump of capabilities:

[2022-01-04 14:01:56.007 447M] ***************************************
[2022-01-04 14:01:56.007 447M] * Linux capability debugging enabled  *
[2022-01-04 14:01:56.007 447M] * CAP_CHOWN                (00) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_DAC_OVERRIDE         (01) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_DAC_READ_SEARCH      (02) = --- *
[2022-01-04 14:01:56.007 447M] * CAP_FOWNER               (03) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_FSETID               (04) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_KILL                 (05) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_SETGID               (06) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_SETUID               (07) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_SETPCAP              (08) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_LINUX_IMMUTABLE      (09) = --- *
[2022-01-04 14:01:56.007 447M] * CAP_NET_BIND_SERVICE     (10) = PIE *
[2022-01-04 14:01:56.007 447M] * CAP_NET_BROADCAST        (11) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_NET_ADMIN            (12) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_NET_RAW              (13) = PIE *
[2022-01-04 14:01:56.008 447M] * CAP_IPC_LOCK             (14) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_IPC_OWNER            (15) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_MODULE           (16) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_RAWIO            (17) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_CHROOT           (18) = PIE *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_PTRACE           (19) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_PACCT            (20) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_ADMIN            (21) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_BOOT             (22) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_NICE             (23) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_RESOURCE         (24) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_TIME             (25) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SYS_TTY_CONFIG       (26) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_MKNOD                (27) = PIE *
[2022-01-04 14:01:56.008 447M] * CAP_LEASE                (28) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_AUDIT_WRITE          (29) = PIE *
[2022-01-04 14:01:56.008 447M] * CAP_AUDIT_CONTROL        (30) = --- *
[2022-01-04 14:01:56.008 447M] * CAP_SETFCAP              (31) = PIE *
[2022-01-04 14:01:56.008 447M] ***************************************
[2022-01-04 14:01:56.008 447M] WARNING: Required Linux capability CAP_NET_ADMIN not available
[2022-01-04 14:01:56.008 447M] WARNING: Required Linux capability CAP_SYS_NICE not available
[2022-01-04 14:01:56.008 447M] WARNING: Required Linux capability CAP_IPC_LOCK not available
DL6ER commented

keeps starting then dying

Do you have a log line showing its last words?

Attached are logs from docker logs and pihole-FTL.log when setting DNSMASQ_USER=root and when leaving it as default (now pihole)

docker-pihole.log
docker-root.log
pihole-FTL-pihole.log
pihole-FTL-root.log

I have tried to change the Docker storage-driver on my DSM 6 from aufs to overlay2 but this resulted in an error when starting Docker:

2022-01-04T21:36:32+01:00 nas2 docker[1863]: ERRO[0000] failed to mount overlay: no such device       storage-driver=overlay2

For now sticking with aufs and the DNSMASQ_USER=root workaround.

Took me a while to figure out how to resolve this until i came across this thread, i am not sure what exactly is going on, but i do wanna figure it all out i do run 2 synology nas systems a 920+ and a 218 play yes i have docker installed on my ds218 play with pihole, for some reason everything worked fine but feels like something was not functioning, by looks of it after i resolved it i stopped seeing image errors on websites so looks like something was indeed broken, hopefully this was the reason i had issues with matchmaking to in 1 game.
what is this storage driver talk all about ? and what should it be at ? 1 is at btrfs the other is vfs i have no idea if this effects docker performance or come with issues.
i usually deploy my containers in shell, i might grab 1 or 2 raspery pi's and run dns off that cos of slow down reasons, which is why i run pihole atm on 2 nas systems atm rather then on 1 as usual.

Not sure if this helps - but I had a similar problem when the telegraf package was switched to not run as root.

influxdata/telegraf#10302
shows the fix I used there succesfully

BUT the fix that worked for telegraf did not work in this case
(i.e. adding to the 'docker run' command
-u "pihole"
did not result in success in this case.)

I can try the
DNSMASQ_USER=root
workaround - but doesn't that defeat the purpose of not running everything as root?

For the short term - I'm going to keep running 2021.12.1

imro2 commented

So I did some destructive testing and here is what I found;

  • turns out I do have btrfs but not on the volume where docker resided. That volume is ext4 and installing docker on it automatically defaults to aufs storage driver
  • trying to change the docker storage driver to either btrfs or overlay2 would result in docker not being able to start
  • I uninstalled docker and reinstalled it on the btrfs volume and the docker storage driver gets automatically set to btrfs. The pihole docker starts with default user, without an issue

Also seeing this issue on Debian/testing. Setting DNSMASQ_USER to root also fixed it for me. So not 100% a Synology issue.

Docker version 20.10.11+dfsg1, build dea9396 running on a btrfs partition.

Seeing this on Ubuntu 20.04 with ext4. the workaround worked for me.

imro2 commented

Also seeing this issue on Debian/testing. Setting DNSMASQ_USER to root also fixed it for me. So not 100% a Synology issue.
Docker version 20.10.11+dfsg1, build dea9396 running on a btrfs partition.

Seeing this on Ubuntu 20.04 with ext4. the workaround worked for me.

To clarify, it is not the file system of the partition that prevents docker container from setting capabilities, it is the docker storage driver. On Synology this gets automatically set to aufs if the partition is ext4 and from my testing and what I found online, this is the only option for Synology in that scenario. However for other linux flavors it should be possible to set the driver to overlay2.

Docker documentation states:

The aufs storage driver is deprecated, and will be removed in a future release. It is recommended that users of the aufs storage driver migrate to overlay2.

More about docker storage drivers here: https://docs.docker.com/storage/storagedriver/select-storage-driver/

I am using TrueNAS SCALE, which uses the ZFS storage driver and I added these:

  • CAP_IPC_LOCK
  • CAP_SYS_NICE
  • CAP_CHOWN
  • CAP_SYS_PTRACE

without CAP_SYS_PTRACE i kept getting "DNS service not running", even though it was responding to DNS requests

without CAP_SYS_PTRACE i kept getting "DNS service not running",

That wont be needed any more in the next release (in the next few hours hopefully) as we no longer check the DNS status with lsof, rather ss which does not need that CAP. Same goes for CAP_IPC_LOCK, the need for that should be gone in the next release.

nxadm commented

DNSMASQ_USER=root did the trick for me, but I needed to recreate the container. Changing docker-compose.yml and restart the container was not enough. (I run the container without additional CAPs).

workaround - but doesn't that defeat the purpose of not running everything as root?

You can use docker-remap: https://docs.docker.com/engine/security/userns-remap/

DNSMASQ_USER=root did the trick for me, but I needed to recreate the container. Changing docker-compose.yml and restart the container was not enough. (I run the container without additional CAPs).

Iโ€™m pretty new to Docker and Pi-hole, but believe docker-compose.yml is only used at build time. You shouldnโ€™t need to recreate the container. You just need to stop the container, change the DNSMASQ_USER environment variable to root, and restart the container. This allowed me to get past the issue with DNS stopping every time it was started. The DNS status now remains active.

imro2 commented

Iโ€™m pretty new to Docker and Pi-hole, but believe docker-compose.yml is only used at build time. You shouldnโ€™t need to recreate the container. You just need to stop the container, change the DNSMASQ_USER environment variable to root, and restart the container.

The reason you want to change your compose file rather than changing it at the container level is that once you have your compose file correct you can take it to another host and recreate the container easily. Container is a disposable thing. You can spin up and throw away as many containers as you wish. The only things that aren't disposable are your compose file and data that should never reside inside your containers. Isolation is but one of the features, not the whole idea.

This allowed me to get past the issue with DNS stopping every time it was started. The DNS status now remains active.

The DNS stopping was a different problem, or a very small subset of this one that as noted here #963 (comment) could have been addressed by adding a specific capability.

nxadm commented

Iโ€™m pretty new to Docker and Pi-hole, but believe docker-compose.yml is only used at build time.

That's an incorrect assumption.

have the same problem and have also taken the workaround with root... is there anything wrong with running pihole as root?

I'm just going to leave this here in case anyone finds this looking for a fix:

The dnsmasq service runs as the pihole user in the container now. You can make it run as root again, but that's not ideal.
Just chown the dnsmasq.d directory in the volume on the host to 999:999, which is the uid/gid of the pihole user in the container.

i.e.

$ sudo chown -R 999:999 /path/to/volume/mount/dnsmasq.d

and restart the container.

@mrdaemon can you try the :dev tag and see if you still need to do this? We've added some additional points in to ensure directories etc belong to the correct user.

This issue is stale because it has been open 30 days with no activity. Please comment or update this issue or it will be closed in 5 days.

This issue has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/dhcp-server-issues/54464/2

the "DNSMASQ_USER=root" fixed it for me.

Really late now, but I had this issue in an upgrade. Was getting errors about pihole FTL essentially being in a reboot loop and being unable to access /etc/dnsmasq.d. All it needed was in the docker-compose.yml was to add a line in the environments variables of DNSMASQ_USER: root and everything was fixed!

This issue does not only happen to Synology (Debian based DSM) but also other distros. I experienced the issue on Ubuntu LTS aarch64 running Pi-hole via Docker Compose, the recently image pull broke what was working. A bit of research indicated the change in Jan 2022 in how the Docker image is built (Dockerfile) was the cause.

The cheapest option to make things work again is to change DNSMASQ_USER to root (definitely not ideal).

It would be nice if owner or whoever has permission can change the issue title to a more generic one.

Yeah -- for clarification on mine above, that was on Xubuntu (Ubuntu) 20.04 LTS amd64 via Docker-Compose.

For anyone coming across this thread due to issue upgrading around April 1, this is likely not your problem (though it looks similar).

Check out: #1026

The DNSMASQ_USER may not be the only issue you're facing.

Have reworked capability setting a little for 2022.04.1:

https://github.com/pi-hole/docker-pi-hole/releases/tag/2022.04.1

Well, with the latest version 2022.04.1, DSM6 does not start, does not matter if
DNSMASQ_USER=pihole
or
DNSMASQ_USER=root

captura 2022-04-02 at 23 30 48

Try 2022.04.2beta

It works, thank you!

I have version 2022.04.2beta and set the DNSMASQ_USER to both pihole and root.
Still I am getting the error message
WARNING: Unable to set capabilities for pihole-FTL.
Please ensure that the container has the required capabilities.

I have updated directly from the 2022.02 release - is there anything else I need to adapt?

I have version 2022.04.2beta and set the DNSMASQ_USER to both pihole and root. Still I am getting the error message WARNING: Unable to set capabilities for pihole-FTL. Please ensure that the container has the required capabilities.

I have updated directly from the 2022.02 release - is there anything else I need to adapt?

I've deleted all the configurations, installed PiHole as new and it works.

This did not help. I deleted the container and all the files - still getting the same error after reinstalling :-(
For now I have restored version 2022.02.1 and my old files, which works fine.

Install with sudo

I am facing the same issue on DSM6. Watchtower pulled the 2022.02.4beta tonight, since then I get the same error and DNS stopped working.
Fallback to 2022.02.1 is working currently without any modifications.

I am facing the same issue on DSM6. Watchtower pulled the 2022.02.4beta tonight, since then I get the same error and DNS stopped working. Fallback to 2022.02.1 is working currently without any modifications.

Hi, same issue with watchtower here.

Please, I'm a bit newbie with pi-hole How did you the fallback to previous version?

Please, I'm a bit newbie with pi-hole How did you the fallback to previous version?

First you need to stop the currently running pihole container (if you have the same problem, it is probably stopped anyway, hehe). Then you need to go to 'Registry' and search for pihole. You need to re-download the image again. After double clicking it be sure not to select the 'latest' version but 2022.02.1. After having downloaded the image you need to set it up just like you did before. Since 2022.04 did not start at all, you probably can re-use the existing config file. Since you downloaded a specific version, watchtower will not run updates on this one. As soon as the bug has been fixed, you can revert to the 'latest' version (this is why I would not necessarily delete it)

Please, I'm a bit newbie with pi-hole How did you the fallback to previous version?

First you need to stop the currently running pihole container (if you have the same problem, it is probably stopped anyway, hehe). Then you need to go to 'Registry' and search for pihole. You need to re-download the image again. After double clicking it be sure not to select the 'latest' version but 2022.02.1. After having downloaded the image you need to set it up just like you did before. Since 2022.04 did not start at all, you probably can re-use the existing config file. Since you downloaded a specific version, watchtower will not run updates on this one. As soon as the bug has been fixed, you can revert to the 'latest' version (this is why I would not necessarily delete it)

I did it and it's working again with same config!

Thanks a lot!

I suspected it was that way, but you know the first time you find that kind of issue you're pretty scared to broke something... and I already stopped Watchtower, I think I prefer to update it by myself after read the changes list.

Last issue with the pihole/root user was a signal, this one the second, I will not let a third one ;-)

Same deal here with me. Watchtower pulled in the latest and it kept failing to start the container.

If using Portainer, simply edit the container and change the image to the following below and redeploy the container. Back to business :)
Screen Shot 2022-04-04 at 10 24 08 AM

Watchtower pulled in the latest and it kept failing to start the container.

Please review the notes here: https://github.com/pi-hole/docker-pi-hole/releases/tag/2021.09

Personally I wouldn't touch watchtower with a barge pole (for Pi-hole, anyway)

Please review the notes here: https://github.com/pi-hole/docker-pi-hole/releases/tag/2021.09

Thank you for this hint. While I completely understand the reasons for rather manually updating, watchtower really saves time in case you run a bunch of containers. Well, not in this case, hehe. I am using the โ€”run-once option with watchtower so I can see right away when there is a problem.

In my case removing the pihole container incl. all config files and reinstalling from scratch did not help. Is it just me?

Back to 2022.02.1. Works.

loral commented

Please review the notes here: https://github.com/pi-hole/docker-pi-hole/releases/tag/2021.09

Thank you for this hint. While I completely understand the reasons for rather manually updating, watchtower really saves time in case you run a bunch of containers. Well, not in this case, hehe. I am using the โ€”run-once option with watchtower so I can see right away when there is a problem.

In my case removing the pihole container incl. all config files and reinstalling from scratch did not help. Is it just me?

Not just you. No matter what I try I'm still having issues with 2022.4.. as well.

  • WARNING: Unable to set capabilities for pihole-FTL.
  • Please ensure that the container has the required capabilities.
imro2 commented

You can exclude specific containers from being updated by watchtower by adding a label

LABEL com.centurylinklabs.watchtower.enable="false"

Container selection

I am running DSM 7.0 and both 2022.04.2beta and 2022.04.2 work without an issue for me. I am going to exclude pi-hole from watchtower, because although my router fails back to a public DNS when pi-hole isn't available, some Linux service like Docker itself, will not wait long enough for DNS resolution and just plain fail.

Just wanted to throw some info out there in case it helps anyone.

I am running pihole on an Unraid system in Docker and couldn't get it to boot because of the same fix_capabilities bash function check, even though I added all required capabilities via --cap-add additional parameters.

Turns out if you leave the privileged flag on or pass in --privileged it will cause the startup script to error out on fix_capabilities because the capsh commands checking for the capabilities do not return the correct "Current:" capabilities - I'm not sure why it doesn't return the explicit capabilities when privileged mode is enabled, but this breaks the fix_capabilities check on the recent 2022.04.2 release, so be sure to disable "privileged" mode.

I also updated all of my volume mount permissions to change ownership to pihole (UID=999 GID=999) and ensured that the "DNSMASQ_USER" env var is not set to "root" and it's working fine for me now since it defaults to pihole.

Maybe an update to the README.md to clarify not to use --privileged mode would help with recent releases?

Cheers!