jamesmcm/vopono

Temporary network connectivity problems cause permanent disconnects for OpenVPN

gharris1727 opened this issue · 5 comments

Hi! Thanks for maintaining vopono, it's incredibly simple to get working and super useful.

I've been having an issue where after a temporary connectivity loss to an OpenVPN server resolves, applications remain disconnected.

What appears to happen is that OpenVPN hits the connect-retry-max hardcoded to 1 by vopono and then gives up retrying. The openvpn process then exits, and is left "defunct" while vopono waits for the exec'd process to finish. The exec'd process is tolerant of connectivity loss and stays running, but will never be able to reconnect because the openvpn process isn't alive to carry traffic. I have to resolve this state manually by noticing the disconnect and stopping the exec'd process before using vopono to start it again.

I tried to work around the issue by changing the .ovpn to add a connect-retry-max value, but it appears that the command-line argument takes precedence. Maybe there's a way around it; I found some documentation online that made the precedence sound complicated but I wasn't able to get it to work.

I confirmed that removing the hardcoded command-line argument fixed my problem by building vopono from source, so that is the workaround i'll be using for the time being. Thank you for the install.sh which works perfectly on the first try!

I'm not sure how best to fix this, but i'll float a few ideas:

  1. Add a command-line flag like --enable-infinite-retries to vopono which removes the --connect-retry-max command-line argument from the openvpn invocation
  2. Only add the --connect-retry-max if there is no explicit connect-retry-max in the .ovpn.
  3. Wait for the openvpn process to exit and restart it with the same settings
  4. Wait for the openvpn process to exit and then signal the exec'd process to shut down cleanly
  5. Call openvpn twice: once to test the connection (with --connect-retry-max) and a second time for the actual run (without --connect-retry-max).

I'd be willing to contribute one of these fixes if you give me some indication of which is best for the tool overall. Thanks!

Hmm, I think the best option would be to set the default to 3 or so (or the value in the .ovpn config), so it doesn't get stuck forever in other circumstances, and then add an argument so it can be set by the user.

And also check when the process has died to restart as you mention (and report a warning). I'm not sure about killing the user's process as it'd be a pain if you have something that needs to save progress, etc. - as long as the killswitch works it won't have network connectivity when it drops anyway.

This means we need another thread to check that openvpn is still running or not, but this is not a bad thing since we will need this anyway to support the port forwarding in ProtonVPN (for natpmpc) - https://protonvpn.com/support/port-forwarding-manual-setup/

One hassle might be if it connects to a different remote the second time, so the killswitch firewall rules have to be updated.

Hmm, I think the best option would be to set the default to 3 or so (or the value in the .ovpn config), so it doesn't get stuck forever in other circumstances, and then add an argument so it can be set by the use

To make sure I understand what you mean, let me summarize the behavior:

  • If the user specifies a number of retries, pass that to openvpn directly.
  • Else-if the ovpn specifies finite or unlimited retries, leave it as-configured.
  • Else the ovpn doesn't specify a limit (and would default to unlimited), set it to 3 retries.

And as for the ovpn process death:

  • In the vopono process which starts openvpn, start up a thread that tails the openvpn log
  • When the process (re-)establishes a connection, refresh the killswitch routes/do protonvpn port forwarding/etc (?)
  • When the process exits but the user process has not exited, restart openvpn with the same arguments.
  • When --keep-alive is specified, the vopono process that started openvpn should remain alive to keep the management thread alive.

Yeah, it looks good, it might be better to check the PID status rather than tailing the log, but it depends a bit on how OpenVPN terminates exactly.