bsdpot/pot

Routing on public-bridge with tailscale

urosgruber opened this issue · 8 comments

I'm not sure if that is a bug or just missing some manual configuration. But let me explain the setup first.

I have setup two pot nodes each running a list of pots over public-bridge. There is no specific pf rules except the nat provided through pot. One node uses 10.192.0.1/16 and the other one 10.193.0.1/16. Pots on the node can talk to each other as well all is accessible through host as expected.

I've then setup tailscale on both nodes with route advertising. I can access pots 10.192.0.1/16 from host 10.193.0.1 and vice versa since routing table is updated through tailscale. The problem I'm heaving is that the same traffic is not working within the pot. I've tested with tcpdump on the destination pot and I can see traffic comes to it and also returns but then the packet is lost somewhere going back to the source pot and I can't find what could be missing.

Route table on 10.192.0.1 host

10.192.0.0/16      link#6             U       bridge0
10.192.0.1         link#2             UHS         lo0
10.193.0.0/16      link#3             US     tailscal

Route table on 10.193.0.1 host

10.193.0.0/16      link#6             U       bridge0
10.193.0.1         link#2             UHS         lo0
10.192.0.0/16      link#3             US     tailscal

pf anchor shows this

nat on vtnet0 inet from 10.192.0.0/16 to any -> (vtnet0:0)

and

nat on vtnet0 inet from 10.193.0.0/16 to any -> (vtnet0:0)

The confusing part to me is the following. The output of tcpdump on the destination node while running ping to 10.192.0.3 from source node.

10:47:45.654394 IP IP_FROM_TAILSCALE > 10.192.0.3: ICMP echo request, id 55577, seq 6, length 64
10:47:45.668983 IP 10.192.0.3 > IP_FROM_TAILSCALE: ICMP echo reply, id 55577, seq 6, length 64

But when testing from within pot

10:48:07.713430 IP 10.193.0.3 > 10.192.0.3: ICMP echo request, id 60185, seq 1, length 64
10:48:08.763099 IP 10.193.0.3 > 10.192.0.3: ICMP echo request, id 60185, seq 2, length 64
10:48:09.778257 IP 10.193.0.3 > 10.192.0.3: ICMP echo request, id 60185, seq 3, length 64

Any help would be more than appreciated.

Hi @urosgruber,

It would help to see all configuration files involved (including routing tables inside the network and arp tables). You also want to make sure to disable IP redirect on all nodes/pots involved to reduce the possibility of creating asymmetry by accident. (sysctl net.inet.ip.redirect=0; sysctl net.inet6.ip6.redirect=0 - also inside pots).

Assuming you're using VNET pots (where each jail has its own routing table) you have a general problem, which is that everything is directly connected.

I would recommend to change your setup to make use of a transfer network and route over it by IP (don't know much about tailscale and if that is possible).

So basically, I would use a setup like this:

Node A                                            Node B
172.31.254.1/30 ---------   172.31.254.2/30

And then create routes to each others pot networks:

Node A
route add -net 10.193.0.0/16 172.31.254.2

Node B
route add -net 10.192.0.0/16 172.31.254.1

This way, anything around vnet routing tables and arp tables etc. stays very easy to understand and control and traffic coming from a pot should flow correctly using its default route.

(all of this without ever having used tailscale and without knowing the content of your config files).

I'll remove the bug tag, as this isn't pot specific, but more of a general FreeBSD/networking question.

Hi @urosgruber,

Any update/input from you on this topic? (what I wrote might have been off, since I know little about tailgate)

Cheers

Sorry. Somehow missed this comment. Let me try to expand on the info and routing itself. So what Tailscale does is it uses Wireguard VPN as a base layer and then through their tool a few nice things can be done. They have this client that you can use and connect different nodes that are not on the same network together. As far as I understand the tool creates a specific interface called tailscal and then uses routing through this interface.

For example

10.192.0.0/16      link#3             US     tailscal
100.x.x.100    link#3             UHS    tailscal
100.x.x.1       link#3             UHS    tailscal

Here you can see that other traffic to other nodes are routed through their interface as well the network that one of the node is exposing. The last is configurable. On other node I can see similar but with the network that I exposed on this initial machine.

As I explained before, I can access all physical nodes and services that are running directly on the host. The Issue becomes when I use a pot that uses vnet and have bridge interface active. Traffic toolks like it's going out without the problem but then the reply is lost. More like source is wrong and some translation is f*** up. If I use just an alias then it works fine.

I tried to debug with tcpdump but haven't found where the problem is.

Node A

Host route (10.192.0.1/16 pot network)

route show 10.192.0.4
   route to: 10.192.0.4
destination: 10.192.0.0
       mask: 255.255.0.0
        fib: 0
  interface: bridge0
      flags: <UP,DONE,PINNED>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0
 
route show 10.193.0.3
   route to: 10.193.0.3
destination: 10.193.0.0
       mask: 255.255.0.0
        fib: 0
  interface: tailscale0
      flags: <UP,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1280         1         0      

Node B

Host route (10.193.0.1/16 pot network)

route show 10.192.0.4
   route to: 10.192.0.4
destination: 10.192.0.0
       mask: 255.255.0.0
        fib: 0
  interface: tailscale0
      flags: <UP,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1280         1         0
       
route show 10.193.0.3
   route to: 10.193.0.3
destination: 10.193.0.0
       mask: 255.255.0.0
        fib: 0
  interface: bridge0
      flags: <UP,DONE,PINNED>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0

That is how it looks on the hosts, If I'm within the pot on each note route goes to a default route and that is like

route show 10.193.0.4
   route to: 10.193.0.4
destination: default
       mask: default
    gateway: 10.192.0.1
        fib: 0
  interface: epair1b
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0

So from routing point of view I can't see anything wrong but still it gets lost at some point since no traffic goes through.

@urosgruber Do you have a minimal setup to reproduce or alternatively a host to ssh into?

I can allow ssh access with pub key

@urosgruber sent you an email to your address from ports

@urosgruber Ok, so it looks like this is a shortcoming in tailscale. In order to create site-to-site VPNs, you need to disable source NAT (looking at pf states on the hosts was quite funky - you can also test this without jails by assigning a source address from the subnet on ping from the host itself as in ping -S 10.192.0.1 10.193.0.1). Unfortunately, the required flag (--snat-subnet-routes=false) is not supported on FreeBSD.

See https://tailscale.com/kb/1214/site-to-site:

--snat-subnet-routes=false: Disables source NAT. In normal operations, a subnet device will see the traffic originating from the subnet router. This simplifies routing, but does not allow traversing multiple networks. By disabling source NAT, the end machine sees the LAN IP address of the originating machine as the source.

And: tailscale/tailscale#5573

In that github issue I found a hint on setting TS_DEBUG_NETSTACK_SUBNETS=0 in tailscaled's environment to disable subnet routing.

So by disabling the netstack mode via export TS_DEBUG_NETSTACK_SUBNETS=0 on FreeBSD (and thus pfSense), the administrator is now on the hook for configuring pf to properly handle subnet routing (which might involve NAT), exit node routing (which again might also involve NAT), and transcribing the filtering semantics described in the Tailscale ACLs into pf rules for that host.

I tried this by setting tailscaled_env=TS_DEBUG_NETSTACK_SUBNETS=0 in /etc/rc.conf on both sides and restarting tailscaled. I also tweaked pf.conf a bit to skip lo0 and put in a pass rule, so that connections actually create state.

With these changes, it seems to work now (jails can ping each other and I could also do a simple connection via netcat between jails on both hosts). You might need to tweak your pf.conf so it's locked down properly.

Not really a pot question though ;)

p.s. You can remove ssh access again

@grembo Thanks for solving this. I was suspecting NAT and saw this par missing on tailscaled FreeBSD implementation, but never got into and checked this as well. I'll check the states created and then apply the necessary rules to lock it down. I owe you a 🍻 for this.