Rootless Podman Wireguard container fails to configure iptables
scottsweb opened this issue ยท 19 comments
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I am attempting to setup Wireguard in a rootless podman container on Fedora Silverblue 36 using https://hub.docker.com/r/linuxserver/wireguard. The container starts but either fails due to permissions errors, or fails silently (depending on which caps are added). I am not sure how the container will be able to modify IP tables when my user (on the host) is not able to.
Steps to reproduce the issue:
- Install Fedora Silverblue
- Overlay
wireguard-tools
withrpm-ostree wireguard-tools
sudo modprobe wireguard
to load the wireguard module- Create a docker-compose file:
version: "3.3"
services:
wireguard:
image: lscr.io/linuxserver/wireguard
container_name: wireguard
hostname: wireguard
restart: always
env_file: ./.settings.env
sysctls:
- net.ipv4.conf.all.src_valid_mark=1
ports:
- 51820:51820/udp
cap_add:
- NET_ADMIN
# - NET_RAW
- SYS_MODULE
volumes:
- ./data/wireguard:/config:Z
- /lib/modules:/lib/modules:ro
- Start the container with
docker-compose up
Describe the results you received:
Without the cap NET_RAW
the container fails to start with the following error:
wireguard | [#] iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
wireguard | iptables v1.8.4 (legacy): can't initialize iptables table `filter': Permission denied (you must be root)
With the cap NET_RAW
added (I found this as a recommendation in this repo), the container simply hangs on the iptables step:
iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
My user on the host does not have permissions to mess around with IPTABLES:
iptables v1.8.7 (nf_tables): Could not fetch rule set generation id: Permission denied (you must be root)
I have a feeling it might be stalling with NET_RAW
due to SELinux but I am not too familar with it or how I would debug it.
Describe the results you expected:
I would expect that with the correct caps the container would start.
Additional information you deem important (e.g. issue happens only occasionally):
Output of podman version
:
Client: Podman Engine
Version: 4.1.1
API Version: 4.1.1
Go Version: go1.18.4
Built: Fri Jul 22 21:05:59 2022
OS/Arch: linux/amd64
Output of podman info --debug
:
host:
arch: amd64
buildahVersion: 1.26.1
cgroupControllers:
- cpu
- io
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.0-2.fc36.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.0, commit: '
cpuUtilization:
idlePercent: 96.46
systemPercent: 1.51
userPercent: 2.03
cpus: 8
distribution:
distribution: fedora
variant: silverblue
version: "36"
eventLogger: journald
hostname: bones
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 5.18.13-200.fc36.x86_64
linkmode: dynamic
logDriver: journald
memFree: 3536007168
memTotal: 15641862144
networkBackend: netavark
ociRuntime:
name: crun
package: crun-1.5-1.fc36.x86_64
path: /usr/bin/crun
version: |-
crun version 1.5
commit: 54ebb8ca8bf7e6ddae2eb919f5b82d1d96863dea
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
remoteSocket:
exists: true
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
version: |-
slirp4netns version 1.2.0-beta.0
commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
libslirp: 4.6.1
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.3
swapFree: 8589930496
swapTotal: 8589930496
uptime: 6h 9m 50.42s (Approximately 0.25 days)
plugins:
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
volume:
- local
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /var/home/bones/.config/containers/storage.conf
containerStore:
number: 23
paused: 0
running: 18
stopped: 5
graphDriverName: overlay
graphOptions: {}
graphRoot: /var/home/bones/.local/share/containers/storage
graphRootAllocated: 1998678130688
graphRootUsed: 45631213568
graphStatus:
Backing Filesystem: btrfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 31
runRoot: /run/user/1000/containers
volumePath: /var/home/bones/.local/share/containers/storage/volumes
version:
APIVersion: 4.1.1
Built: 1658516759
BuiltTime: Fri Jul 22 21:05:59 2022
GitCommit: ""
GoVersion: go1.18.4
Os: linux
OsArch: linux/amd64
Version: 4.1.1
Package info (e.g. output of rpm -q podman
or apt list podman
):
podman-4.1.1-3.fc36.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
The answer to your question as to how rootless Podman can alter iptables rules when your user cannot is simple: Podman cannot do anything your user cannot do, so it can't make changes to the host's firewall rules. We can make limited changes inside the rootless network namespace, but any changes to the host's firewall config are not possible.
If you need to make changes to the host's configuration, you will need a root container. I cannot recall having seen a working Wireguard (or other VPN) setup on rootless Podman, for reference (but I personally have a root container with an openvpn connection).
@mheon thanks for claifying... I am in the process of moving 20 containers from docker and have managed to find work arounds for most things, but failing with this one. I had seen reports on a couple of forums that people had got this container working rootles but I have yet to find any real details on how that might be possible.
I think my options are:
- Figure out how to give my user access to
iptables
(doesn't seem ideal) - Run this one container as root (will make it hard for me to manage from Home Assistant running rootless)
- Run Wireguard on the host as root directly (also harder to manage from Home Assistant)
- Run it on a dedicated Pi with docker (can manage docker remotely via Home Assistant)
If anyone ever gets this working then please drop a comment below. Closing.
You can run wireguard and iptables in a rootless container if the kernel modules are already loaded, compare https://github.com/containers/podman/blob/main/contrib/modules-load.d/podman-iptables.conf
If you use iptables-nft in the container you not even need the module loaded.
Also you need to give your container extra permissions with cap_add but looks liek you already take care of that.
But as @mheon said you can only change the firewall rules in the container network namespace, it will not work if you use --network host for example.
@Luap99 you have given me some hope... but I have yet to find a path forwards.
On the host I ran sudo modprobe ip-tables
and then checking with lsmod | grep ip
shows that ip_tables
, ip6_tables
and nf_tables
are all present:
iptable_nat 16384 0
iptable_filter 16384 0
ip6_udp_tunnel 16384 1 wireguard
nft_fib_ipv4 16384 1 nft_fib_inet
nft_fib_ipv6 16384 1 nft_fib_inet
nft_fib 16384 3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet
nf_reject_ipv4 16384 1 nft_reject_inet
nf_reject_ipv6 20480 1 nft_reject_inet
nf_nat 57344 4 xt_nat,nft_chain_nat,iptable_nat,xt_MASQUERADE
nf_defrag_ipv6 24576 1 nf_conntrack
nf_defrag_ipv4 16384 1 nf_conntrack
ip_set 61440 0
nf_tables 270336 695 nft_ct,nft_compat,nft_reject_inet,nft_fib_ipv6,nft_objref,nft_fib_ipv4,nft_chain_nat,nft_reject,nft_fib,nft_fib_inet
nfnetlink 20480 4 nft_compat,nf_tables,ip_set
ipmi_devintf 20480 0
ipmi_msghandler 122880 1 ipmi_devintf
ip6_tables 36864 0
ip_tables 36864 2 iptable_filter,iptable_nat
Running iptables --list
or iptables-nft --list
or ip6tables --list
all get permission denied from my user on the host.
Switching into the container with NET_ADMIN
, NET_RAW
and SYS_MODULE
caps. I get the following for lsmod
:
iptable_nat 16384 1
iptable_filter 16384 1
ip6_udp_tunnel 16384 1 wireguard
nft_fib_ipv4 16384 1 nft_fib_inet
nft_fib_ipv6 16384 1 nft_fib_inet
nft_fib 16384 3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet
nf_reject_ipv4 16384 1 nft_reject_inet
nf_reject_ipv6 20480 1 nft_reject_inet
nf_nat 57344 4 xt_nat,nft_chain_nat,iptable_nat,xt_MASQUERADE
nf_defrag_ipv6 24576 1 nf_conntrack
nf_defrag_ipv4 16384 1 nf_conntrack
ip_set 61440 0
nf_tables 270336 734 nft_ct,nft_compat,nft_reject_inet,nft_fib_ipv6,nft_objref,nft_fib_ipv4,nft_chain_nat,nft_reject,nft_fib,nft_fib_inet
nfnetlink 20480 4 nft_compat,nf_tables,ip_set
ipmi_devintf 20480 0
ipmi_msghandler 122880 1 ipmi_devintf
ip6_tables 36864 0
ip_tables 36864 2 iptable_filter,iptable_nat
Which looks the same. iptables --list
and iptables-nft --list
both work. With the output being:
root@wireguard:/# iptables --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source
root@wireguard:/# iptables-nft --list
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
# Warning: iptables-legacy tables present, use iptables-legacy to see them
That warning for iptables-nft
is interesting to me.
The container boot is still hanging though at the log:
iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
Running this command manually within the container I get no feedback. I also tried swapping iptables
with iptables-nft
. I also noticed that the iptables
rule from Wireguard references eth0
but the network interface in the container is actually eth0@if136
. Right now I feel like I need more verbose output or logs to see if I can find out why it stalls.
Thanks for the pointers.
In #7816 (comment) we had been pointed at https://github.com/jcarrano/wg-podman
Would this help to resolve this issue, if prevailing?
@almereyda I think the use case is a little different, this looks to be for container connectivity and I was hoping just to run wireguard as a server for other devices to connect to (which required IPTABLES changes on the host which I couldn't get to work). There may be some useful ideas here though.
Does podman on ub22 vs RHEL9 behaves different? Even I was trying to run wireguard with NET_ADMIN & NET_RAW, but it works great when the host is UB22, but it fails on RHEL9 with the permission issue. Btw the arch is arm64. Why does it behave differently here?
EDIT: NVM, I had to include iptable_filter, iptable_nat modules explicitly !!
@hasan4791 can you share your run command or compose file so I can see how you got it working. Thanks
@scottsweb This is my start script,
#!/usr/bin/env bash
set -e
WORKING_DIR="$(pwd)"
if [[ ! -d "${WORKING_DIR}"/config ]]; then
mkdir -p "${WORKING_DIR}"/config
fi
podman run -d \
--name=wireguard \
--cap-add=NET_ADMIN \
-e PUID=1000 \
-e PGID=1000 \
-e TZ=Asia/Kolkata \
-e SERVERURL=server.url `#optional` \
-e SERVERPORT=51820`#optional` \
-e PEERS=1 `#optional` \
-e PEERDNS=auto `#optional` \
-e INTERNAL_SUBNET=172.32.1.0 `#optional` \
-e ALLOWEDIPS=0.0.0.0/0 `#optional` \
-e LOG_CONFS=true `#optional` \
-p 51820:51820/udp \
-v "${WORKING_DIR}"/config:/config:Z \
--sysctl="net.ipv4.conf.all.src_valid_mark=1" \
--restart always \
lscr.io/linuxserver/wireguard:latest
And security info from podman debug command is
security:
apparmorEnabled: false
capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
Apart from this, there are 2 things that need to be done when running in rootless mode.
- Open the port in firewall
- Update the masquerade command inside container to tap interface
iptables -t nat -A POSTROUTING -o tap+ -j MASQUERADE
For me, with these changes, I'm still not able to get the VPN working though it says handshake is successful. In the end, I started using rootful mode where everything works without the above changes.
Figured out the problem, it was with MTU size. Had to update the mtu size for slirp4netns in containers.conf which resolved my issue. Now everything is working fine in rootless mode.
Thanks for the tips @hasan4791 - I will do some testing.
Could someone throw some light on this weird behavior. In rootless mode, with podman-restart service enabled for the non-root user, and after the node reboot, I'm getting this iptables errors.
iptables v1.8.7 (nf_tables): Chain 'MASQUERADE' does not exist
Mysteriously, if I create & run the same container as rootful container and then again creating & running it as rootless container, it works all good. Underlying host is Oracle Linux 9.1, arm64 and selinux is enabled.
Update: Don't call me mad. This issue is getting resolved if I run a container in root mode which also uses iptables. For eg: If i have openvpn-as running as root container & wireguard in rootless container, everything works on reboot. Is it a bug or what?
@rhatdan Checked audit logs and getting these on reboot, but as i said, if i go back to root & come, there are no denials. Whats happening in the system?
----
time->Thu Jan 19 11:14:15 2023
type=PROCTITLE msg=audit(1674126855.044:173): proctitle=69707461626C6573002D74006E6174002D4100504F5354524F5554494E47002D6F0074617030002D6A004D415351554552414445
type=SYSCALL msg=audit(1674126855.044:173): arch=c00000b7 syscall=206 success=yes exit=52 a0=4 a1=ffffddf34330 a2=34 a3=0 items=0 ppid=3347 pid=3350 auid=1000 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=(none) ses=3 comm="iptables" exe="/usr/sbin/xtables-nft-multi" subj=system_u:system_r:container_t:s0:c235,c731 key=(null)
type=AVC msg=audit(1674126855.044:173): avc: denied { module_request } for pid=3350 comm="iptables" kmod="ipt_MASQUERADE" scontext=system_u:system_r:container_t:s0:c235,c731 tcontext=system_u:system_r:kernel_t:s0 tclass=system permissive=0
Update: If i'm not wrong, these are the iptables-extended modules which is being loaded by iptables itself. I assume on first boot we had no modules loaded but then on running as root containers, it gets loaded and after that for rootless containers it is available as it is. Is there anyway to make it available to the rootless containers on boot without enabling "domain_kernel_load_modules". Please correct me if im wrong.
Before: nf_nat 61440 2 nft_chain_nat,iptable_nat
After: nf_nat 61440 3 nft_chain_nat,iptable_nat,xt_MASQUERADE
Some modules cannot be loaded by a rootless user, I know iptables-legacy needs this https://github.com/containers/podman/blob/main/contrib/modules-load.d/podman-iptables.conf.
Maybe the nft compat layer also cannot load some modules as rootless.
@Luap99 Hey Paul, thanks for looking. Btw i had already loaded iptables module on boot. Its just that the extended modules which the iptables loads on runtime is being denied. I thought of handling it in by having a custom policy. Btw would it be great, if we have a tunable parameter like "container_kernel_load_modules" to achieve this.
Final update, if anyone is trying Rootless Wireguard on Fedora bases distros, following modules needs to be enabled
ip_tables
iptable_filter
iptable_nat
wireguard
xt_MASQUERADE
Thank everyone who all helped here and Sorry for hijacking here @scottsweb Would like to keep discussions at one place for future reference.
@hasan4791 no problem at all. Glad you made some progress.
Just to check, you enabled these as kernel modules? and there were no changes needed to the container?
@hasan4791 no problem at all. Glad you made some progress.
Just to check, you enabled these as kernel modules? and there were no changes needed to the container?
Right, also we need to run as root user in the rootles container btw. If you would like to know more, feel free to check out my automation that I'm currently using it to deploy in my servers.