slingamn/namespaced-openvpn

namespace using dns from /etc/resolv.conf, not from the /etc/netns/protected/resolv.conf

soredake opened this issue · 18 comments

@slingamn @chros73
I start sudo $HOME/git/namespaced-openvpn/namespaced-openvpn --config $HOME/Documents/vpn/ccrypto-fr-udp.ovpn

My openvpn .conf file:

verb 4
client
tls-client
script-security 2
remote-cert-tls server
dev tun
nobind
persist-key
persist-tun
comp-lzo yes

remote gw.fr.204vpn.net 1196 udp

auth-user-pass

redirect-gateway def1
tun-ipv6
route-ipv6 2000::/3

up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf

Thanks for the report!

The default value of --dns is push. However, even if the server does not push any DNS options, my understanding of the current behavior is that an empty /etc/netns/protected/resolv.conf file should be written, leaving DNS inside the namespace in a broken state.

Some questions:

  1. How are you spawning the application that runs in the namespace / how did you verify that the application is using the wrong nameserver?
  2. Does it work if you pass an IP address for the DNS server, e.g., --dns 8.8.8.8?
  3. Are there any warnings (e.g., "No nameservers were set") in the output?
  4. Is /etc/netns/protected/resolv.conf being written correctly with the pushed options?
  5. Is the bind mount working correctly? You can inspect the inodes to verify. On my system, I see:
root@good-fortune:~# ls -li /etc/resolv.conf
12847371 -rw-r--r-- 1 root root 41 Jan 14 12:45 /etc/resolv.conf
root@good-fortune:~# ls -li /etc/netns/protected/resolv.conf 
12847157 -rw-r--r-- 1 root root 52 Jan 14 12:46 /etc/netns/protected/resolv.conf
root@good-fortune:~# ip netns exec protected ls -li /etc/resolv.conf
12847157 -rw-r--r-- 1 root root 52 Jan 14 12:46 /etc/resolv.conf
root@good-fortune:~# ip netns exec protected ls -li /etc/netns/protected/resolv.conf
12847157 -rw-r--r-- 1 root root 52 Jan 14 12:46 /etc/netns/protected/resolv.conf
root@good-fortune:~# ip netns exec protected grep resolv.conf /proc/self/mounts
/dev/mapper/good--fortune--vg-root /etc/resolv.conf ext4 rw,relatime,errors=remount-ro,data=ordered 0 0

so it can be verified that outside of an ip netns exec, the file at /etc/resolv.conf has the inode 12847371, but inside it, it has the inode 12847157 (which is the inode of /etc/netns/protected/resolv.conf).

  1. sudo ip netns exec protected sudo -u user $HOME/bin/firefox / using this sites https://www.perfect-privacy.com/dns-leaktest/ and https://www.dnsleaktest.com/, they are showing my real dns, not dns from namespaced /etc/resolv.conf?
  2. No, it doesn't work, namespace is still using real /etc/resolv.conf (not from the namespace /etc/resolv.conf)
  3. No, no warnings.

4, 5.

❯ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 127.0.0.1

❯ sudo ip netns exec protected cat /etc/resolv.conf
nameserver 10.99.0.20

❯ sudo ip netns exec protected cat /etc/netns/protected/resolv.conf
nameserver 10.99.0.20

❯ sudo ip netns exec protected grep resolv.conf /proc/self/mounts  
/dev/sda2 /etc/resolv.conf xfs rw,noatime,attr2,inode64,noquota 0 0

Dunno why it's using wrong /etc/resolv.conf

i'm using NetworkManager, if that's matter

Hmm. The loopback adapter is local to each network namespace, so whatever responder you have running on 127.0.0.1:53 in the root namespace should be inaccessible in the protected namespace. On my system:

shivaram@good-fortune:~$ dig @127.0.0.1 www.google.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 www.google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61095
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.google.com.			IN	A

;; ANSWER SECTION:
www.google.com.		248	IN	A	172.217.8.196

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Jan 15 20:07:47 EST 2018
;; MSG SIZE  rcvd: 59

shivaram@good-fortune:~$ sudo ip netns exec protected dig @127.0.0.1 www.google.com

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @127.0.0.1 www.google.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Some debugging suggestions:

  1. Try running sudo ip netns exec protected dig www.google.com and look at the SERVER line in the output. If it's 10.99.0.20, then that means the bind mount is working correctly for normal applications. You might want to look at your network settings in Firefox, to see if it's using "system proxy settings" or some such. When set to "no proxy", it should behave the same way as dig.
  2. Try navigating to the URL file:///etc/resolv.conf in Firefox.

You might want to look at your network settings in Firefox

Tried all settings, still using /etc/resolv.conf from real root. Google Chrome is using real /etc/resolv.conf too.

❯ sudo ip netns exec protected dig www.google.com
; <<>> DiG 9.11.1-P3 <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61169
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.google.com.			IN	A

;; ANSWER SECTION:
www.google.com.		252	IN	A	209.85.202.105
www.google.com.		252	IN	A	209.85.202.106
www.google.com.		252	IN	A	209.85.202.99
www.google.com.		252	IN	A	209.85.202.103
www.google.com.		252	IN	A	209.85.202.104
www.google.com.		252	IN	A	209.85.202.147

;; Query time: 45 msec
;; SERVER: 10.99.0.20#53(10.99.0.20)
;; WHEN: Вт янв 16 12:01:59 EET 2018
;; MSG SIZE  rcvd: 139

16-01-2018_12 05 09

Try tcpdump as a way of inspecting raw DNS traffic. Run these commands in parallel: sudo tcpdump -i any -n udp port 53 (which inspects outside the namespace) and sudo ip netns exec protected tcpdump -i any -n udp port 53 (which inspects inside it).

When I visit https://www.perfect-privacy.com/dns-leaktest/ with Firefox running in the namespace, I see no output from the first command, but output like this from the second (partially redacted):

shivaram@good-fortune:~$ sudo ip netns exec protected tcpdump -i any -n udp port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
10:53:21.810484 IP 10.97.10.6.51476 > 209.222.18.222.53: 6881+ A? detectportal.firefox.com. (42)
10:53:21.810555 IP 10.97.10.6.51476 > 209.222.18.222.53: 20993+ AAAA? detectportal.firefox.com. (42)
10:53:21.842582 IP 209.222.18.222.53 > 10.97.10.6.51476: 20993 0/0/0 (42)
10:53:21.911733 IP 209.222.18.222.53 > 10.97.10.6.51476: 6881 4/0/0 CNAME detectportal.firefox.com.edgesuite.net., CNAME a1089.d.akamai.net., A 23.67.250.187, A 23.67.250.152 (155)
10:53:22.968281 IP 10.97.10.6.41117 > 209.222.18.222.53: 29195+ A? www.perfect-privacy.com. (41)
10:53:22.968293 IP 10.97.10.6.41117 > 209.222.18.222.53: 18471+ AAAA? www.perfect-privacy.com. (41)
10:53:22.981840 IP 209.222.18.222.53 > 10.97.10.6.41117: 18471 0/0/0 (41)
10:53:23.001850 IP 209.222.18.222.53 > 10.97.10.6.47446: 31393 0/0/0 (33)
10:53:23.090574 IP 209.222.18.222.53 > 10.97.10.6.41117: 29195 6/4/17 A 37.59.164.111, A 5.79.98.56, A 95.211.146.77, A 37.48.94.55, A 185.17.184.3, A 95.211.199.144 (480)
10:53:23.186690 IP 10.97.10.6.42125 > 209.222.18.222.53: 41070+ A? ocsp.digicert.com. (35)
10:53:23.186696 IP 10.97.10.6.42125 > 209.222.18.222.53: 57979+ AAAA? ocsp.digicert.com. (35)
10:53:23.203865 IP 209.222.18.222.53 > 10.97.10.6.42125: 57979 0/0/0 (35)
10:53:27.868008 IP 10.97.10.6.49660 > 209.222.18.222.53: 56862+ A? 469y0i7rei7s_0.dns-leak.com. (45)
10:53:27.868020 IP 10.97.10.6.49660 > 209.222.18.222.53: 10288+ AAAA? 469y0i7rei7s_0.dns-leak.com. (45)
10:53:27.868969 IP 10.97.10.6.39955 > 209.222.18.222.53: 64737+ A? 469y0i7rei7s_0.dns-check.info. (47)
10:53:27.868978 IP 10.97.10.6.39955 > 209.222.18.222.53: 56044+ AAAA? 469y0i7rei7s_0.dns-check.info. (47)
10:53:27.869612 IP 10.97.10.6.50092 > 209.222.18.222.53: 16631+ A? 469y0i7rei7s_1.dns-leak.com. (45)
10:53:27.869661 IP 10.97.10.6.50092 > 209.222.18.222.53: 38481+ AAAA? 469y0i7rei7s_1.dns-leak.com. (45)
10:53:27.870627 IP 10.97.10.6.50410 > 209.222.18.222.53: 5877+ A? 469y0i7rei7s_1.dns-check.info. (47)
10:53:27.870703 IP 10.97.10.6.50410 > 209.222.18.222.53: 28176+ AAAA? 469y0i7rei7s_1.dns-check.info. (47)

209.222.18.222 is my VPN provider's DNS server. When using github.com outside the namespace, I see DNS being relayed from my local resolver on 127.0.0.1 to my LAN's resolver on 192.168.1.1:

10:54:13.384559 IP 127.0.0.1.47836 > 127.0.0.1.53: 39581+ A? user-images.githubusercontent.com. (51)
10:54:13.384640 IP 192.168.1.100.8401 > 192.168.1.1.53: 996+ A? user-images.githubusercontent.com. (51)
10:54:13.384657 IP 127.0.0.1.47836 > 127.0.0.1.53: 50855+ AAAA? user-images.githubusercontent.com. (51)
10:54:13.384690 IP 192.168.1.100.16174 > 192.168.1.1.53: 12019+ AAAA? user-images.githubusercontent.com. (51)
10:54:13.394136 IP 192.168.1.1.53 > 192.168.1.100.8401: 996 2/0/0 CNAME github.map.fastly.net., A 151.101.116.133 (10
10:54:13.394200 IP 127.0.0.1.53 > 127.0.0.1.47836: 39581 2/0/0 CNAME github.map.fastly.net., A 151.101.116.133 (102)
10:54:13.400191 IP 192.168.1.1.53 > 192.168.1.100.16174: 12019 1/1/0 CNAME github.map.fastly.net. (144)
10:54:13.400513 IP 127.0.0.1.53 > 127.0.0.1.47836: 50855 1/1/0 CNAME github.map.fastly.net. (144)

As mentioned before, there should be no process inside the namespace listening on 127.0.0.1:53 --- if there is, you can identify it with ss and then figure out why Firefox is trying to talk to it.

I see output only from first command, executing ss show's no connections to 127.0.0.1 in namespace.

Huh. Are you sure that non-DNS traffic is being routed over the VPN? For example, what do you see in sudo ip netns exec protected tcpdump -n -i any tcp port 443 if you try to visit https://www.google.com/ in the namespace-confined Firefox?

You can try running Firefox under strace (I had decent results with sudo ip netns exec protected sudo -u $USER strace -s 2048 -e network -ff -o ffstrace firefox, which traces network-related calls from each subprocess and writes the traces to files named, e.g., ffstrace.27091). Here's a captured DNS request to www.weather.com on my system:

socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 83
connect(83, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("209.222.18.222")}, 16) = 0
sendmmsg(83, [{msg_hdr={msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\366:\1\0\0\1\0\0\0\0\0\0\3www\7weather\3com\0\0\1\0\1", iov_len=33}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=33}, {msg_hdr={msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="~Z\1\0\0\1\0\0\0\0\0\0\3www\7weather\3com\0\0\34\0\1", iov_len=33}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_DONTWAIT|MSG_EOR|MSG_CONFIRM|MSG_ERRQUEUE|MSG_NOSIGNAL|MSG_BATCH|MSG_FASTOPEN|0x89500000}, msg_len=33}], 2, MSG_NOSIGNAL) = 2
recvfrom(83, "\366:\201\200\0\1\0\3\0\0\0\0\3www\7weather\3com\0\0\1\0\1\300\f\0\5\0\1\0\0\0<\0!\7pmd-www\7weather\3com\7edgekey\3net\0\300-\0\5\0\1\0\0T`\0\30\6e12930\3ksd\nakamaiedge\300I\300Z\0\1\0\1\0\0\0\24\0\4\254\345\347\255", 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("209.222.18.222")}, [28->16]) = 130
recvfrom(83, "~Z\201\200\0\1\0\0\0\0\0\0\3www\7weather\3com\0\0\34\0\1", 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("209.222.18.222")}, [28->16]) = 33
❯ sudo ip netns exec protected tcpdump -n -i any tcp port 443
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
19:28:48.534807 IP 10.99.4.21.60202 > 2.23.108.233.443: Flags [.], ack 3637538135, win 192, options [nop,nop,TS val 3026454800 ecr 1120865168], length 0
19:28:48.607476 IP 2.23.108.233.443 > 10.99.4.21.60202: Flags [.], ack 1, win 972, options [nop,nop,TS val 1120875406 ecr 3026391680], length 0
19:28:49.558983 IP 10.99.4.21.50362 > 35.161.44.155.443: Flags [.], ack 1396630045, win 182, options [nop,nop,TS val 447153251 ecr 589893529], length 0
19:28:50.449037 IP 35.161.44.155.443 > 10.99.4.21.50362: Flags [.], ack 1, win 111, options [nop,nop,TS val 589896104 ecr 447112096], length 0
19:28:50.582794 IP 10.99.4.21.40984 > 52.35.101.126.443: Flags [.], ack 3958496892, win 174, options [nop,nop,TS val 3729718497 ecr 182009020], length 0
19:28:50.582926 IP 10.99.4.21.40992 > 52.35.101.126.443: Flags [.], ack 2660271389, win 187, options [nop,nop,TS val 3729718497 ecr 182009019], length 0
19:28:50.582929 IP 10.99.4.21.40990 > 52.35.101.126.443: Flags [.], ack 2790774206, win 164, options [nop,nop,TS val 3729718497 ecr 182009019], length 0
19:28:50.582931 IP 10.99.4.21.40982 > 52.35.101.126.443: Flags [.], ack 3563013200, win 164, options [nop,nop,TS val 3729718497 ecr 182009020], length 0
19:28:50.794307 IP 52.35.101.126.443 > 10.99.4.21.40992: Flags [.], ack 1, win 122, options [nop,nop,TS val 182011579 ecr 3729677769], length 0
19:28:50.794638 IP 52.35.101.126.443 > 10.99.4.21.40990: Flags [.], ack 1, win 117, options [nop,nop,TS val 182011579 ecr 3729677555], length 0
19:28:50.800620 IP 52.35.101.126.443 > 10.99.4.21.40982: Flags [.], ack 1, win 117, options [nop,nop,TS val 182011580 ecr 3729677565], length 0
19:28:50.808757 IP 52.35.101.126.443 > 10.99.4.21.40984: Flags [.], ack 1, win 122, options [nop,nop,TS val 182011580 ecr 3729677756], length 0
19:28:51.606812 IP 10.99.4.21.40988 > 52.35.101.126.443: Flags [.], ack 3535884381, win 157, options [nop,nop,TS val 3729719521 ecr 182009275], length 0
19:28:51.820198 IP 52.35.101.126.443 > 10.99.4.21.40988: Flags [.], ack 1, win 111, options [nop,nop,TS val 182011835 ecr 3729678429], length 0
19:28:52.118795 IP 10.99.4.21.40986 > 52.35.101.126.443: Flags [.], ack 2979407370, win 199, options [nop,nop,TS val 3729720033 ecr 182009403], length 0
19:28:52.328555 IP 52.35.101.126.443 > 10.99.4.21.40986: Flags [.], ack 1, win 128, options [nop,nop,TS val 182011963 ecr 3729678972], length 0
19:28:54.166808 IP 10.99.4.21.44836 > 54.69.224.218.443: Flags [.], ack 4172702308, win 158, options [nop,nop,TS val 405542634 ecr 31215234], length 0
19:28:54.167031 IP 10.99.4.21.47354 > 54.148.143.136.443: Flags [.], ack 4090227174, win 167, options [nop,nop,TS val 465302242 ecr 135765037], length 0
19:28:54.387144 IP 54.148.143.136.443 > 10.99.4.21.47354: Flags [.], ack 1, win 117, options [nop,nop,TS val 135767692 ecr 465270788], length 0
19:28:54.389291 IP 54.69.224.218.443 > 10.99.4.21.44836: Flags [.], ack 1, win 114, options [nop,nop,TS val 31217794 ecr 405501540], length 0

Can't find output similar to your, all log files here
ffstrace.zip

Actually, I think I found the cause in your strace output:

connect(60, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
sendto(60, "\2\0\0\0\16\0\0\0\34\0\0\0ocsp.int-x3.letsencrypt.org\0", 40, MSG_NOSIGNAL, NULL, 0) = 40

It looks like you have nscd enabled, so Firefox is trying to use it (over a UNIX domain socket, which is not affected by the change of network namespace) for DNS resolution. Since the actual nscd process is presumably running in your root namespace, it's resolving the hostnames over your physical interface, using the nameservers in /etc/resolv.conf.

If this is in fact the explanation, it's a pretty big gotcha and something I should add a warning about in the project readme.

Yes, thanks, without nscd running dns resolution going through VPN dns. Maybe override this socket in namespace is good idea?

Yeah --- I need to think through the scenarios in more detail. Here's what I know so far:

  1. systemd-resolved exports a dbus-based API for name resolution; some applications may be using this instead of IP-based DNS
  2. dbus access to systemd-resolved may or may not be blocked by network namespace isolation
  3. It looks like most NSS implementations don't support dbus; with a typical nsswitch.conf file, getaddrinfo(3) will end up doing DNS over IP, possibly after an attempted lookup with mdns (but is mdns restricted to names ending in .local? and is it implemented in a way that will be confined by the network namespace?)

Overriding the nscd socket works for me:

sudo ip netns exec protected sh -c "mount --bind /dev/null /var/run/nscd/socket && exec sudo -u $USER mycmd"

Thanks, that's a pretty good mitigation. Unfortunately, I learned that bind mounts are a fairly brittle mechanism for solving this problem:

https://unix.stackexchange.com/questions/418304/why-do-linux-bind-mounts-disappear-if-the-mount-points-inode-changes

I've been meaning to update namespaced-openvpn, and its documentation, to reflect this issue. The specific relevance to nscd is that if nscd is restarted in a way that destroys and recreates the socket, the protected namespace will be exposed to it again. There is also a problem where NetworkManager and resolvconf will overwrite /etc/resolv.conf in response to DHCP renewals; the protected namespace will not (and cannot) send DNS requests in plaintext, but it can be tricked into using a malicious nameserver that is publicly routable (i.e., routable from the VPN's egress node).

However, it is possible for namespaced-openvpn to write an /etc/netns/${namespace}/nsswitch.conf that disallows NSS-based DNS resolution via systemd-resolved over dbus (modulo this same caveat about bind mounts).

This is fully documented as of d906a96, so closing.

Thanks!

aidyw commented

Hi All,
A thorny and troublesome problem. I have similar experiences with resolv.conf. I have tried many different approaches but the thought of disabling several important services which are running on the root host seems like overkill.
It may not work for ever scenario, but I overcame my own issues in this regard by falling back on some packet level filtering using iptables. This may not entirely solve the problem at source if dbus is involved but certainly any DNS queries that finally go via IP then some simple SNAT and DNAT rules within the netns will simply force dns traffic to the correct server.
It's a fix in many cases, in particular where the entries in /etc/resolv.conf have been cached by the application. With the best will in the world I can not reliably get netns to read the /etc/netns/[name]/resolv.conf file. Almost certainly as has been mentioned due to dhcp updates triggering a change which screws the bind mount. Either way and regardless of where a netns constrained app decides to attempt a query, if the iptables rules modify the packet as it leaves the netns, the DNS query will always hit the correct server.
eg. in the nat table:

-A OUTPUT -p udp --dport 53 -j DNAT --to-destination 172.16.0.1
-A OUTPUT -p tcp --dport 53 -j DNAT --to-destination 172.16.0.1

-A POSTROUTING -p udp --dport 53 -j SNAT --to-source 172.16.0.22
-A POSTROUTING -p tcp --dport 53 -j SNAT --to-source 172.16.0.22

What a PITA!
Aidan

I figured out a way to do a mount namespace over /etc/resolv.conf while allowing all the normal updates to /etc/resolv.conf: Make /etc/resolv.conf be a symlink to within a directory that is otherwise empty and wont have its inode altered. ../run/systemd/resolve/stub-resolv.conf works fine. If you wanted to be paranoid, I'd set it to something like /resolv-holder/resolv.conf and chattr +i /resolv-holder. Make a mount namespace + bindmount over the symlinked directory, eg through systemd BindPaths or other ways. Then you can modify resolv.conf inside and outside the namespace.