angt/glorytun

Cannot Add 2nd Path

legolas108 opened this issue · 23 comments

Thanks for making glorytun available, looks very promising! Still have a problem with adding 2nd path.

Server:

server:~# glorytun bind 0.0.0.0 keyfile /etc/glorytun.key &
server:~# ip addr add 10.80.1.1 peer 10.80.1.2/24 dev tun0
server:~# ip link set dev tun0 up

Client:

client:~# glorytun bind 0.0.0.0 to <SERVER_IP> keyfile /etc/glorytun.key &
client:~# ip addr add 10.80.1.2 peer 10.80.1.1/24 dev tun0
client:~# ip link set dev tun0 up
client:~# glorytun path up 192.168.12.10 rate tx 2mbit rx 8mbit

This works fine, can ping the other side from both sides, though latency is pretty high (> 120ms) and very unsteady. (Latency when connecting directly is some 70ms and more steady.)

When adding 2nd path on client

client:~# glorytun path up 192.168.12.11 rate tx 2mbit rx 8mbit

both pings don't work any more and glorytun path outputs:

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.12.10 port 5000
  public:  <PUBLIC_IP1> port 4711
  peer:    <SERVER_IP> port 5000
  mtu:     1387 bytes
  rtt:     3978.264 ms
  rttvar:  3174.221 ms
  tx:
    rate:  250000 bytes/sec
    loss:  92 percent
    total: 300 packets
  rx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 93 packets
path UP
  status:  DEGRADED
  bind:    192.168.12.11 port 5000
  public:  - port 0
  peer:    <SERVER_IP> port 5000
  mtu:     1302 bytes
  rtt:     0.000 ms
  rttvar:  0.000 ms
  tx:
    rate:  250000 bytes/sec
    loss:  0 percent
    total: 0 packets
  rx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 0 packets

Both paths have dedicated routing tables:

client:~# ip rule list
0:      from all lookup local 
10:     from 192.168.12.10 lookup wl0 
10:     from all to 192.168.12.10 lookup wl0 
11:     from 192.168.12.11 lookup wl1 
11:     from all to 192.168.12.11 lookup wl1 
32766:  from all lookup main 
32767:  from all lookup default 

client:~# ip route show table wl0
default via 192.168.12.1 dev enusb00 
192.168.12.0/24 dev enusb00 scope link 

client:~# ip route show table wl1
default via 192.168.12.1 dev enusb01 
192.168.12.0/24 dev enusb01 scope link 

and making any of the paths' IP addresses the default gateway for the client makes Internet access available properly.

Using FireHOL for firewalling. Opened port 5000 for both, TCP and UDP, on the server, and traffic from/to the tun device is accepted unrestricted. On the client traffic for the tun devices is permitted freely to/from the LAN device.

Any idea what might make glorytun fail here?

Thanks so much for helping!

Continued to experiment. The issue seems to have to do with the fact that the connections are both 4G, same provider, same device type. Setting up glorytun on our two ADSL connections works just fine, even when dynamically adding/removing a line (path) - real great!

angt commented

Good :) Do you use it alone or with overthebox/openmptcprouter ?

Using it alone, and need to have it run over the 4G connections also as we want to replace the ADSL with the 4G connections. Can the 4G connections be made to work with OverTheBox? Or is there a way to get the 4G connections to work with glorytun? Like the simplicity of glorytun very much!

angt commented

If you compiled glorytun from the master branch, I highly recommend you to retry with the last release.
as master is my dev branch :)

Thanks much for following up! Did switch to the current 0.2.2 version and there's improvement! It's now building up both paths and everything looks perfect when examining the glorytun path output.

But the joy is only short lived. The download speed is not aggregated for the two paths, while upload speed is partly. And after a few speed tests one of the paths is "choking" and the whole connection slows down to almost nothing.

Wondering about the rate limits given with the glorytun path up command. These wireless connections have quite a variation, e.g. between 6 and 20 Mbps for download. What would I pass to this command, the min or the max? Tried both, and the choking occurs with both settings. Speed is capped at the given rate limit. So, is that the meaning, or what is it being used for internally?

angt commented

Yes, on unstable LTE we need dynamic rate limiting (linked to #51).
With the current release, you may want to setup a lower rate to keep a good quality.
Also, do not hesitate to share current stat of your setup, this will help to check future improvements.

OK, thanks, looking forward to DRL :-)

Current setup hasn't changed except for the speeds in the glorytun path up 192.168.12.1X rate tx 2mbit rx 8mbit statements.

Sample glorytun path output for ... tx 8mbit rx 12mbit right after initializing the tunnel:

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.12.10 port 5000
  public:  <PUBLIC_IP1> port 17903
  peer:    <SERVER_IP> port 5000
  mtu:     1386 bytes
  rtt:     246.360 ms
  rttvar:  139.920 ms
  tx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 9 packets
  rx:
    rate:  1500000 bytes/sec
    loss:  0 percent
    total: 3 packets
path UP
  status:  OK
  bind:    192.168.12.11 port 5000
  public:  <PUBLIC_IP2> port 11525
  peer:    <SERVER_IP> port 5000
  mtu:     1386 bytes
  rtt:     245.961 ms
  rttvar:  115.522 ms
  tx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 13 packets
  rx:
    rate:  1500000 bytes/sec
    loss:  0 percent
    total: 3 packets

Little later, in the "choking" state glorytun path outputs:

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.12.10 port 5000
  public:  <PUBLIC_IP1> port 17903
  peer:    <SERVER_IP> port 5000
  mtu:     1400 bytes
  rtt:     97.514 ms
  rttvar:  4.911 ms
  tx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 36089 packets
  rx:
    rate:  1500000 bytes/sec
    loss:  0 percent
    total: 44874 packets
path UP
  status:  OK
  bind:    192.168.12.11 port 5000
  public:  <PUBLIC_IP2> port 11525
  peer:    <SERVER_IP> port 5000
  mtu:     1350 bytes
  rtt:     1705.178 ms
  rttvar:  308.659 ms
  tx:
    rate:  1000000 bytes/sec
    loss:  100 percent
    total: 31083 packets
  rx:
    rate:  1500000 bytes/sec
    loss:  0 percent
    total: 27209 packets

Note for the 2nd path the reduced mtu, high rtt* values, and 100% loss for tx.

angt commented

Look likes there is something wrong with your second path, all packets are lost.
If you stop the first path with glorytun path down 192.168.12.10, does the second one works ?

Tried with what I believe should be really very low values, i.e. ... tx 2mbit rx 4mbit and aggregation works pretty well both directions, but only for a short time, then degradation sets in again:

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.12.10 port 5000
  public:  <PUBLIC_IP1> port 17272
  peer:    <SERVER_IP> port 5000
  mtu:     1400 bytes
  rtt:     100.062 ms
  rttvar:  7.812 ms
  tx:
    rate:  250000 bytes/sec
    loss:  2 percent
    total: 25857 packets
  rx:
    rate:  500000 bytes/sec
    loss:  0 percent
    total: 28516 packets
path UP
  status:  OK
  bind:    192.168.12.11 port 5000
  public:  <PUBLIC_IP2> port 16429
  peer:    <SERVER_IP> port 5000
  mtu:     1397 bytes
  rtt:     1719.238 ms
  rttvar:  1988.194 ms
  tx:
    rate:  250000 bytes/sec
    loss:  100 percent
    total: 25397 packets
  rx:
    rate:  500000 bytes/sec
    loss:  0 percent
    total: 26060 packets

Rebuilt the tunnel on both sides with only the 2nd path with ... tx 4mbit rx 8mbit. Very little loss and also somewhat reasonable rtt* values:

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.12.11 port 5000
  public:  <PUBLIC_IP2> port 23386
  peer:    <SERVER_IP> port 5000
  mtu:     1400 bytes
  rtt:     90.316 ms
  rttvar:  5.152 ms
  tx:
    rate:  500000 bytes/sec
    loss:  4 percent
    total: 24776 packets
  rx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 37568 packets

Several speed tests provide fairly consistent values both directions and close to the configured max's.

OK, leaving it running for a while and doing "real-world" browsing makes the tunnel fail again:

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.12.11 port 5000
  public:  <PUBLIC_IP2> port 23386
  peer:    <SERVER_IP> port 5000
  mtu:     1400 bytes
  rtt:     1610.916 ms
  rttvar:  331.574 ms
  tx:
    rate:  500000 bytes/sec
    loss:  64 percent
    total: 54794 packets
  rx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 81556 packets

Will try the first path under same conditions and post the results shortly.

First path is doing better, but like with the second path, certain https sites won't load any more with connection timing out. Loading with ADSL line works fine.

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.12.10 port 5000
  public:  <PUBLIC_IP1> port 17251
  peer:    <SERVER_IP> port 5000
  mtu:     1400 bytes
  rtt:     102.662 ms
  rttvar:  17.012 ms
  tx:
    rate:  500000 bytes/sec
    loss:  5 percent
    total: 18985 packets
  rx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 26287 packets

Following up on the https sites that won't load, this also happens when using our ADSL lines. E.g. TotalWireless site simply times out without loading.

There's no error indicated in glorytun path:

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.11.2 port 5000
  public:  <PUBLIC_IP1> port 5000
  peer:    <SERVER_IP> port 5000
  mtu:     1464 bytes
  rtt:     64.988 ms
  rttvar:  0.287 ms
  tx:
    rate:  62500 bytes/sec
    loss:  0 percent
    total: 181569 packets
  rx:
    rate:  437500 bytes/sec
    loss:  0 percent
    total: 250479 packets
path UP
  status:  OK
  bind:    192.168.10.2 port 5000
  public:  <PUBLIC_IP2> port 5000
  peer:    <SERVER_IP> port 5000
  mtu:     1464 bytes
  rtt:     71.094 ms
  rttvar:  0.521 ms
  tx:
    rate:  62500 bytes/sec
    loss:  0 percent
    total: 178656 packets
  rx:
    rate:  437500 bytes/sec
    loss:  0 percent
    total: 244449 packets

Many other sites load fine. And loading TotalWireless site from a single line (without glorytun) also loads just fine.

angt commented

does this work from your server ?

Sorry, not sure I understand fully what you mean. All changes/experiments have been done on the local Internet gateway server. The other end, the VPS (cloud) server config, was not changed from the original config. And all browsing and speed tests were done through this local gateway server. Tried different browsers also.

angt commented

The mtu is not the same on your 4G and your ADSL. Maybe you forgot to configure mss clamping ?

MSS clamping is done by FireHOL (tcpmss auto "ppp+ enusb+") the same for both, ADSL (ppp+) and 4G (enusb+) devices. When using my current ADSL bonding solution, TotalWireless loads fine also. (But that solution doesn't work with the 4G connections either, many lost packets and slower speed than an individual line, no idea why :-(.)

When you try to access TotalWireless from a glorytun tunnel it works fine?

angt commented

Yes, I confirm it works.

angt commented

I have also tested with glorytun over 4G and it works too.

Thanks so much for doing the testing. Is your 4G also so much variable speed as ours? Have seen speeds here between 3/1 and 22/6 Mbps.

Looks like there's something wrong in my setup here. Will go through it once more, also replace the firewall by minimal manual iptables rules.

Would you share how you configured your 4G connections or is it basically as I did?

angt commented

I did a very basic setup like yours.
my 4g: 3% drop, 50ms rtt and 10ms rttvar.
Rate variability is not important when you test your 4G alone.

OK, so I did a complete, new, independent setup. Installed a Ubuntu 18.04 server from scratch as local gateway machine and only added glorytun 0.2.2, configured the most basic iptables rules and routing entries. Installed glorytun 0.2.2 on another VPS in the cloud that is only moderately used as backup (standby) web server otherwise. And - drum roll, please - glorytun now works with 4G! TotalWireless.com loads nicely also. After quite a few speed tests and larger downloads and starting/stopping/adding a path/removing a path glorytun feels very stable and robust and definitely usable for our purpose. There seems to have been simply too complex/wrong other configuration on the other server pair.

Only thing - not to get too euphoric - is the aggregation of higher speeds. When configuring with low speeds that a line easily can handle, like 8/2 Mbps, aggregation seems to work fine, getting 15.4/3.8 Mbps out of it. When running individual lines, results often show 12/4 Mbps, but when configuring glorytun with these speeds, the result is often only 10/3 Mbps, never more than the configured 12/4 Mbps. Best speed test result so far was 15/4 Mbps when configuring for 8/3 Mbps. So, looks like when configuring rates beyond what's available consistently glorytun simply doesn't aggregate what's available. That's probably improved when the DRL is implemented.

Sample glorytun path output:

client:~# glorytun path
path UP
  status:  OK
  bind:    192.168.12.10 port 5000
  public:  <PUBLIC_IP1> port 19025
  peer:    <SERVER_IP> port 5000
  mtu:     1400 bytes
  rtt:     110.234 ms
  rttvar:  30.111 ms
  tx:
    rate:  500000 bytes/sec
    loss:  0 percent
    total: 95249 packets
  rx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 109778 packets
path UP
  status:  OK
  bind:    192.168.12.11 port 5000
  public:  <PUBLIC_IP2> port 17736
  peer:    <SERVER_IP> port 5000
  mtu:     1400 bytes
  rtt:     119.175 ms
  rttvar:  40.595 ms
  tx:
    rate:  500000 bytes/sec
    loss:  3 percent
    total: 95099 packets
  rx:
    rate:  1000000 bytes/sec
    loss:  0 percent
    total: 108306 packets

Thanks again for all your help and a great tool!

angt commented

Thank you for the feedback 👍
DRL should help that and make your configuration easier just by setting the max rate.
But to get the best rate, you'll have to wait for latency compensation (feature #43).