containers/podman

Rootless Podman Wireguard container fails to configure iptables

scottsweb opened this issue ยท 19 comments

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

I am attempting to setup Wireguard in a rootless podman container on Fedora Silverblue 36 using https://hub.docker.com/r/linuxserver/wireguard. The container starts but either fails due to permissions errors, or fails silently (depending on which caps are added). I am not sure how the container will be able to modify IP tables when my user (on the host) is not able to.

Steps to reproduce the issue:

  1. Install Fedora Silverblue
  2. Overlay wireguard-tools with rpm-ostree wireguard-tools
  3. sudo modprobe wireguard to load the wireguard module
  4. Create a docker-compose file:
version: "3.3"

services:
  wireguard:
    image: lscr.io/linuxserver/wireguard
    container_name: wireguard
    hostname: wireguard
    restart: always
    env_file: ./.settings.env
    sysctls:
      - net.ipv4.conf.all.src_valid_mark=1
    ports:
      - 51820:51820/udp
    cap_add:
      - NET_ADMIN
#      - NET_RAW
      - SYS_MODULE
    volumes:
      - ./data/wireguard:/config:Z
      - /lib/modules:/lib/modules:ro
  1. Start the container with docker-compose up

Describe the results you received:

Without the cap NET_RAW the container fails to start with the following error:

wireguard    | [#] iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
wireguard    | iptables v1.8.4 (legacy): can't initialize iptables table `filter': Permission denied (you must be root)

With the cap NET_RAW added (I found this as a recommendation in this repo), the container simply hangs on the iptables step:

iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

My user on the host does not have permissions to mess around with IPTABLES:

iptables v1.8.7 (nf_tables): Could not fetch rule set generation id: Permission denied (you must be root)

I have a feeling it might be stalling with NET_RAW due to SELinux but I am not too familar with it or how I would debug it.

Describe the results you expected:

I would expect that with the correct caps the container would start.

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Client:       Podman Engine
Version:      4.1.1
API Version:  4.1.1
Go Version:   go1.18.4
Built:        Fri Jul 22 21:05:59 2022
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpuUtilization:
    idlePercent: 96.46
    systemPercent: 1.51
    userPercent: 2.03
  cpus: 8
  distribution:
    distribution: fedora
    variant: silverblue
    version: "36"
  eventLogger: journald
  hostname: bones
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.18.13-200.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 3536007168
  memTotal: 15641862144
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.5-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.5
      commit: 54ebb8ca8bf7e6ddae2eb919f5b82d1d96863dea
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 6h 9m 50.42s (Approximately 0.25 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/bones/.config/containers/storage.conf
  containerStore:
    number: 23
    paused: 0
    running: 18
    stopped: 5
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/bones/.local/share/containers/storage
  graphRootAllocated: 1998678130688
  graphRootUsed: 45631213568
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 31
  runRoot: /run/user/1000/containers
  volumePath: /var/home/bones/.local/share/containers/storage/volumes
version:
  APIVersion: 4.1.1
  Built: 1658516759
  BuiltTime: Fri Jul 22 21:05:59 2022
  GitCommit: ""
  GoVersion: go1.18.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.1.1-3.fc36.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

mheon commented

The answer to your question as to how rootless Podman can alter iptables rules when your user cannot is simple: Podman cannot do anything your user cannot do, so it can't make changes to the host's firewall rules. We can make limited changes inside the rootless network namespace, but any changes to the host's firewall config are not possible.

If you need to make changes to the host's configuration, you will need a root container. I cannot recall having seen a working Wireguard (or other VPN) setup on rootless Podman, for reference (but I personally have a root container with an openvpn connection).

@mheon thanks for claifying... I am in the process of moving 20 containers from docker and have managed to find work arounds for most things, but failing with this one. I had seen reports on a couple of forums that people had got this container working rootles but I have yet to find any real details on how that might be possible.

I think my options are:

  • Figure out how to give my user access to iptables (doesn't seem ideal)
  • Run this one container as root (will make it hard for me to manage from Home Assistant running rootless)
  • Run Wireguard on the host as root directly (also harder to manage from Home Assistant)
  • Run it on a dedicated Pi with docker (can manage docker remotely via Home Assistant)

If anyone ever gets this working then please drop a comment below. Closing.

You can run wireguard and iptables in a rootless container if the kernel modules are already loaded, compare https://github.com/containers/podman/blob/main/contrib/modules-load.d/podman-iptables.conf
If you use iptables-nft in the container you not even need the module loaded.

Also you need to give your container extra permissions with cap_add but looks liek you already take care of that.

But as @mheon said you can only change the firewall rules in the container network namespace, it will not work if you use --network host for example.

@Luap99 you have given me some hope... but I have yet to find a path forwards.

On the host I ran sudo modprobe ip-tables and then checking with lsmod | grep ip shows that ip_tables, ip6_tables and nf_tables are all present:

iptable_nat            16384  0
iptable_filter         16384  0
ip6_udp_tunnel         16384  1 wireguard
nft_fib_ipv4           16384  1 nft_fib_inet
nft_fib_ipv6           16384  1 nft_fib_inet
nft_fib                16384  3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet
nf_reject_ipv4         16384  1 nft_reject_inet
nf_reject_ipv6         20480  1 nft_reject_inet
nf_nat                 57344  4 xt_nat,nft_chain_nat,iptable_nat,xt_MASQUERADE
nf_defrag_ipv6         24576  1 nf_conntrack
nf_defrag_ipv4         16384  1 nf_conntrack
ip_set                 61440  0
nf_tables             270336  695 nft_ct,nft_compat,nft_reject_inet,nft_fib_ipv6,nft_objref,nft_fib_ipv4,nft_chain_nat,nft_reject,nft_fib,nft_fib_inet
nfnetlink              20480  4 nft_compat,nf_tables,ip_set
ipmi_devintf           20480  0
ipmi_msghandler       122880  1 ipmi_devintf
ip6_tables             36864  0
ip_tables              36864  2 iptable_filter,iptable_nat

Running iptables --list or iptables-nft --list or ip6tables --list all get permission denied from my user on the host.

Switching into the container with NET_ADMIN, NET_RAW and SYS_MODULE caps. I get the following for lsmod:

iptable_nat            16384  1
iptable_filter         16384  1
ip6_udp_tunnel         16384  1 wireguard
nft_fib_ipv4           16384  1 nft_fib_inet
nft_fib_ipv6           16384  1 nft_fib_inet
nft_fib                16384  3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet
nf_reject_ipv4         16384  1 nft_reject_inet
nf_reject_ipv6         20480  1 nft_reject_inet
nf_nat                 57344  4 xt_nat,nft_chain_nat,iptable_nat,xt_MASQUERADE
nf_defrag_ipv6         24576  1 nf_conntrack
nf_defrag_ipv4         16384  1 nf_conntrack
ip_set                 61440  0
nf_tables             270336  734 nft_ct,nft_compat,nft_reject_inet,nft_fib_ipv6,nft_objref,nft_fib_ipv4,nft_chain_nat,nft_reject,nft_fib,nft_fib_inet
nfnetlink              20480  4 nft_compat,nf_tables,ip_set
ipmi_devintf           20480  0
ipmi_msghandler       122880  1 ipmi_devintf
ip6_tables             36864  0
ip_tables              36864  2 iptable_filter,iptable_nat

Which looks the same. iptables --list and iptables-nft --list both work. With the output being:

root@wireguard:/# iptables --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source 
root@wireguard:/# iptables-nft --list
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
# Warning: iptables-legacy tables present, use iptables-legacy to see them

That warning for iptables-nft is interesting to me.

The container boot is still hanging though at the log:

iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

Running this command manually within the container I get no feedback. I also tried swapping iptables with iptables-nft. I also noticed that the iptables rule from Wireguard references eth0 but the network interface in the container is actually eth0@if136. Right now I feel like I need more verbose output or logs to see if I can find out why it stalls.

Thanks for the pointers.

In #7816 (comment) we had been pointed at https://github.com/jcarrano/wg-podman

Would this help to resolve this issue, if prevailing?

@almereyda I think the use case is a little different, this looks to be for container connectivity and I was hoping just to run wireguard as a server for other devices to connect to (which required IPTABLES changes on the host which I couldn't get to work). There may be some useful ideas here though.

Does podman on ub22 vs RHEL9 behaves different? Even I was trying to run wireguard with NET_ADMIN & NET_RAW, but it works great when the host is UB22, but it fails on RHEL9 with the permission issue. Btw the arch is arm64. Why does it behave differently here?

EDIT: NVM, I had to include iptable_filter, iptable_nat modules explicitly !!

@hasan4791 can you share your run command or compose file so I can see how you got it working. Thanks

@scottsweb This is my start script,

#!/usr/bin/env bash

set -e

WORKING_DIR="$(pwd)"

if [[ ! -d "${WORKING_DIR}"/config ]]; then
  mkdir -p "${WORKING_DIR}"/config
fi

podman run -d \
  --name=wireguard \
  --cap-add=NET_ADMIN \
  -e PUID=1000 \
  -e PGID=1000 \
  -e TZ=Asia/Kolkata \
  -e SERVERURL=server.url `#optional` \
  -e SERVERPORT=51820`#optional` \
  -e PEERS=1 `#optional` \
  -e PEERDNS=auto `#optional` \
  -e INTERNAL_SUBNET=172.32.1.0 `#optional` \
  -e ALLOWEDIPS=0.0.0.0/0 `#optional` \
  -e LOG_CONFS=true `#optional` \
  -p 51820:51820/udp \
  -v "${WORKING_DIR}"/config:/config:Z \
  --sysctl="net.ipv4.conf.all.src_valid_mark=1" \
  --restart always \
  lscr.io/linuxserver/wireguard:latest

And security info from podman debug command is

  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true

Apart from this, there are 2 things that need to be done when running in rootless mode.

  1. Open the port in firewall
  2. Update the masquerade command inside container to tap interface
    iptables -t nat -A POSTROUTING -o tap+ -j MASQUERADE

For me, with these changes, I'm still not able to get the VPN working though it says handshake is successful. In the end, I started using rootful mode where everything works without the above changes.

Figured out the problem, it was with MTU size. Had to update the mtu size for slirp4netns in containers.conf which resolved my issue. Now everything is working fine in rootless mode.

Thanks for the tips @hasan4791 - I will do some testing.

Could someone throw some light on this weird behavior. In rootless mode, with podman-restart service enabled for the non-root user, and after the node reboot, I'm getting this iptables errors.
iptables v1.8.7 (nf_tables): Chain 'MASQUERADE' does not exist
Mysteriously, if I create & run the same container as rootful container and then again creating & running it as rootless container, it works all good. Underlying host is Oracle Linux 9.1, arm64 and selinux is enabled.

Update: Don't call me mad. This issue is getting resolved if I run a container in root mode which also uses iptables. For eg: If i have openvpn-as running as root container & wireguard in rootless container, everything works on reboot. Is it a bug or what?

@Luap99 Ideas?

@rhatdan Checked audit logs and getting these on reboot, but as i said, if i go back to root & come, there are no denials. Whats happening in the system?

----
time->Thu Jan 19 11:14:15 2023
type=PROCTITLE msg=audit(1674126855.044:173): proctitle=69707461626C6573002D74006E6174002D4100504F5354524F5554494E47002D6F0074617030002D6A004D415351554552414445
type=SYSCALL msg=audit(1674126855.044:173): arch=c00000b7 syscall=206 success=yes exit=52 a0=4 a1=ffffddf34330 a2=34 a3=0 items=0 ppid=3347 pid=3350 auid=1000 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=3 comm="iptables" exe="/usr/sbin/xtables-nft-multi" subj=system_u:system_r:container_t:s0:c235,c731 key=(null)
type=AVC msg=audit(1674126855.044:173): avc:  denied  { module_request } for  pid=3350 comm="iptables" kmod="ipt_MASQUERADE" scontext=system_u:system_r:container_t:s0:c235,c731 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0

Update: If i'm not wrong, these are the iptables-extended modules which is being loaded by iptables itself. I assume on first boot we had no modules loaded but then on running as root containers, it gets loaded and after that for rootless containers it is available as it is. Is there anyway to make it available to the rootless containers on boot without enabling "domain_kernel_load_modules". Please correct me if im wrong.

Before: nf_nat                 61440  2 nft_chain_nat,iptable_nat
After:    nf_nat                 61440  3 nft_chain_nat,iptable_nat,xt_MASQUERADE

Some modules cannot be loaded by a rootless user, I know iptables-legacy needs this https://github.com/containers/podman/blob/main/contrib/modules-load.d/podman-iptables.conf.
Maybe the nft compat layer also cannot load some modules as rootless.

@Luap99 Hey Paul, thanks for looking. Btw i had already loaded iptables module on boot. Its just that the extended modules which the iptables loads on runtime is being denied. I thought of handling it in by having a custom policy. Btw would it be great, if we have a tunable parameter like "container_kernel_load_modules" to achieve this.

Final update, if anyone is trying Rootless Wireguard on Fedora bases distros, following modules needs to be enabled

ip_tables
iptable_filter
iptable_nat
wireguard
xt_MASQUERADE

Thank everyone who all helped here and Sorry for hijacking here @scottsweb Would like to keep discussions at one place for future reference.

@hasan4791 no problem at all. Glad you made some progress.

Just to check, you enabled these as kernel modules? and there were no changes needed to the container?

@hasan4791 no problem at all. Glad you made some progress.

Just to check, you enabled these as kernel modules? and there were no changes needed to the container?

Right, also we need to run as root user in the rootles container btw. If you would like to know more, feel free to check out my automation that I'm currently using it to deploy in my servers.

https://github.com/hasan4791/x-servers/blob/main/ansible/roles/setup-instance/templates/x-server-modules.conf.j2