Small delay on authentication.
g00nix opened this issue · 7 comments
There is small delay of a few seconds when users get authenticated with this .so
. Here is an example:
Fri Jun 7 14:51:22 2019 us=127255 MULTI: multi_create_instance called
Fri Jun 7 14:51:22 2019 us=127302 Re-using SSL/TLS context
Fri Jun 7 14:51:22 2019 us=127356 Control Channel MTU parms [ L:1623 D:1210 EF:40 EB:0 ET:0 EL:3 ]
Fri Jun 7 14:51:22 2019 us=127369 Data Channel MTU parms [ L:1623 D:1450 EF:123 EB:406 ET:0 EL:3 ]
Fri Jun 7 14:51:22 2019 us=127403 Local Options String (VER=V4): 'V4,dev-type tun,link-mtu 1559,tun-mtu 1500,proto TCPv4_SERVER,cipher AES-256-CBC,auth SHA1,keysize 256,key-method 2,tls-server'
Fri Jun 7 14:51:22 2019 us=127409 Expected Remote Options String (VER=V4): 'V4,dev-type tun,link-mtu 1559,tun-mtu 1500,proto TCPv4_CLIENT,cipher AES-256-CBC,auth SHA1,keysize 256,key-method 2,tls-client'
Fri Jun 7 14:51:22 2019 us=127450 TCP connection established with [AF_INET]192.168.121.1:47448
Fri Jun 7 14:51:22 2019 us=127456 TCPv4_SERVER link local: (not bound)
Fri Jun 7 14:51:22 2019 us=127463 TCPv4_SERVER link remote: [AF_INET]192.168.121.1:47448
Fri Jun 7 14:51:23 2019 us=127998 192.168.121.1:47448 TLS: Initial packet from [AF_INET]192.168.121.1:47448, sid=64d63019 fa9c6f15
Fri Jun 7 14:51:23 2019 us=154159 192.168.121.1:47448 peer info: IV_VER=2.4.7
Fri Jun 7 14:51:23 2019 us=154195 192.168.121.1:47448 peer info: IV_PLAT=linux
Fri Jun 7 14:51:23 2019 us=154208 192.168.121.1:47448 peer info: IV_PROTO=2
Fri Jun 7 14:51:23 2019 us=154220 192.168.121.1:47448 peer info: IV_NCP=2
Fri Jun 7 14:51:23 2019 us=154230 192.168.121.1:47448 peer info: IV_LZ4=1
Fri Jun 7 14:51:23 2019 us=154241 192.168.121.1:47448 peer info: IV_LZ4v2=1
Fri Jun 7 14:51:23 2019 us=154251 192.168.121.1:47448 peer info: IV_LZO=1
Fri Jun 7 14:51:23 2019 us=154261 192.168.121.1:47448 peer info: IV_COMP_STUB=1
Fri Jun 7 14:51:23 2019 us=154272 192.168.121.1:47448 peer info: IV_COMP_STUBv2=1
Fri Jun 7 14:51:23 2019 us=154283 192.168.121.1:47448 peer info: IV_TCPNL=1
Fri Jun 7 14:51:23 2019 us=155778 192.168.121.1:47448 PLUGIN_CALL: POST /root/openvpn-plugin-auth-script.so/PLUGIN_AUTH_USER_PASS_VERIFY status=2
Fri Jun 7 14:51:23 2019 us=155880 192.168.121.1:47448 TLS: Username/Password authentication deferred for username 'FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3' [CN SET]
Fri Jun 7 14:51:23 2019 us=156630 192.168.121.1:47448 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384
Fri Jun 7 14:51:23 2019 us=156721 192.168.121.1:47448 [FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3] Peer Connection Initiated with [AF_INET]192.168.121.1:47448
Fri Jun 7 14:51:24 2019 us=283465 192.168.121.1:47448 PUSH: Received control message: 'PUSH_REQUEST'
Fri Jun 7 14:51:24 2019 us=283627 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 MULTI_sva: pool returned IPv4=10.202.0.10, IPv6=(Not enabled)
okFri Jun 7 14:51:24 2019 us=311863 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 OPTIONS IMPORT: reading client specific options from: /dev/shm//openvpn_cc_3ebc5a271b15743f1ae661976e3ed9a2.tmp
Fri Jun 7 14:51:24 2019 us=312057 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 MULTI: Learn: 10.202.0.10 -> FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448
Fri Jun 7 14:51:24 2019 us=312101 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 MULTI: primary virtual IP for FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448: 10.202.0.10
Fri Jun 7 14:51:29 2019 us=444439 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 PUSH: Received control message: 'PUSH_REQUEST'
Fri Jun 7 14:51:29 2019 us=444563 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 SENT CONTROL [FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3]: 'PUSH_REPLY,redirect-gateway def1,dhcp-option DNS 9.9.9.9,route 10.202.0.1,topology net30,ifconfig 10.202.0.10 10.202.0.9,peer-id 0,cipher AES-256-GCM' (status=1)
Fri Jun 7 14:51:29 2019 us=444628 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 Data Channel: using negotiated cipher 'AES-256-GCM'
Fri Jun 7 14:51:29 2019 us=444711 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 Data Channel MTU parms [ L:1551 D:1450 EF:51 EB:406 ET:0 EL:3 ]
Fri Jun 7 14:51:29 2019 us=445077 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
Fri Jun 7 14:51:29 2019 us=445140 FadZR0cO1YnqMeXOXH7k1KbghDmbO6ttBISLuCgh3MHQZ87WE3mYXROZx9jk3/192.168.121.1:47448 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
This behavior was strange since the bash script returned successful result at Fri Jun 7 14:51:23 UTC 2019
, but access was granted at 14:51:29
My memory on this functionality is a bit hazy but I believe this behavior is actually caused by the way OpenVPN manages the interface this plugin implements. Once the auth process is initiated OpenVPN essentially continues about its business for several seconds before polling the result file to see if access was granted or rejected. I want to say that polling window is 5 seconds to minimize the impact on active network flows, however, it might be configurable.
The log entries you're referencing at 14:51:23
in that log detail the OpenVPN process triggering the plugin call.
The log entries you're referencing at
14:51:23
in that log detail the OpenVPN process triggering the plugin call.
The script was also spitting out timestamp and it returned at the same second, without delay.
I want to say that polling window is 5 seconds to minimize the impact on active network flows, however, it might be configurable.
If that is the case, than this is more an OpenVPN issue than anything related to the plugin. Do you have any idea where in the OpenVPN code I should start looking?
Do you have any idea where in the OpenVPN code I should start looking?
Hi @g00nix ,
Were you able to figure that out? Any hints @pdemonaco ?
@ss-17 nope, we just figured we can run a high number of users with serial authentication, so we gave up on this plugin. Just make sure your authentication script is fast.
As I mentioned earlier, this is likely intentional behavior from the perspective of the OpenVPN process as it is single threaded. The purpose of the asynchronous authentication hook is to prevent a slow authentication action for a single new connection from causing traffic for other active VPN clients to pause. Instead of actually waiting for a response, it kicks off the authentication attempt and then returns to its normal business of processing traffic. After some wait interval (5 seconds possibly) it rechecks the authentication provider to see if a result has been found. This process will repeat for as long as it takes for the provider to return.
In my use case, the authentication provider was tied to a phone-app based multi-factor solution. As a result, each authentication event included the following time components:
- OpenVPN -> Auth Server ~ 5 ms
- Auth Server -> Cloud ~ 20 ms
- Cloud -> Mobile Phone ~ 50 - 300 ms
- User delay ( users don't immediately answer / notice the push) ~ 5000 - 45000 ms
- Mobile Phone -> Cloud ~ 50 - 300 ms
- Cloud -> Auth Server ~ 20 ms
- Auth Server -> OpenVPN ~ 5 ms
This yields something on the order of 5 seconds delay in the best case and in the worst case it can easily extend beyond a minute. It's important to note that no traffic will be processed by OpenVPN during this time as it blocks waiting for the response from the authentication provider on each authentication request.
When we first implemented without asynchronous authentication, the concurrent connection of 10 to 15 users would totally disrupt all VPN traffic for about 15 minutes by creating an authentication storm. I suspect this was a combination of different authentication events timing out during the delay of the first connection and users retrying connections and queuing up a series of authentication attempts.
If your authentication provider returns immediately there is no reason to use code which implements this asynchronous hook. If you do have lengthy delays, I'm struggling to understand why this brief client-side delay is a concern.
Thanks for your reply @g00nix . Its not the plugin's shortcoming as the delay is because of how OpenVPN works to achieve this more or less asynchronous authentication.
I'm struggling to understand why this brief client-side delay is a concern.
@pdemonaco, its not a cause of concern. Just a curiosity to figure out whether this delay is reducible in code. I might be wrong but after some digging in the relevant files in order to understand the mechanism, my conclusion is that you cannot reduce this delay by changing something in the code on the server side. It works like this:
- Client -> Sends auth details.
- Server -> Server initiates the deferred auth mechanism (creates auth_control_file) and moves on.
- Client -> Sends PUSH_REQUEST to check status on the auth.
- Server checks the auth_control_file and if its updated by the background process sends the auth status to client (success or failed). Otherwise doesn't reply anything and moves on (just like the step 2).
- If client gets the auth status = all good. Otherwise client waits for PUSH_REQUEST_INTERVAL (by default 5 seconds in src/openvpn/common.h) and sends another PUSH_REQUEST (back to step 3).
So as you can see the delay on re-checking the auth status is determined by the client OpenVPN and not the server which does it only when the client asks for it. If the background process updates the auth_control_file by the time client sends the first PUSH_REQUEST, the authentication is almost instantaneous. But if that doesn't happen (which is normal and happens randomly), client will ask for rechecking of auth status after a period of 5 seconds and thus the delay. Server isn't the guilty party here. The relevant code is in forward.c:
/*
* Periodically resend PUSH_REQUEST until PUSH message received
*/
void
check_push_request_dowork(struct context *c)
{
send_push_request(c);
/* if no response to first push_request, retry at PUSH_REQUEST_INTERVAL second intervals */
event_timeout_modify_wakeup(&c->c2.push_request_interval, PUSH_REQUEST_INTERVAL);
}
The impossible and impractical solution would be to ask every client to compile OpenVPN after reducing PUSH_REQUEST_INTERVAL to 1 or 2 from 5 in the common.h file. The other workaround is to compile server with --enable_async_push. But then you have to have a fixed and same "cipher ALGO" (ALGO here is case sensitive. "cipher AES-128-CBC" on server won't match "cipher aes-128-cbc" on client) on both server and client side and disable cipher negotiation by putting "ncp-disable" in server config. I checked this configuration and it does remove the delay at the cost of sticking to a fixed encryption cipher for every client. The inability to use NCP with async-push was fixed in the latest version of OpenVPN (2.4.9) but I am yet to test this.
https://community.openvpn.net/openvpn/ticket/1259
https://github.com/OpenVPN/openvpn/blob/release/2.4/Changes.rst
Hope this helps.
When we first implemented without asynchronous authentication, the concurrent connection of 10 to 15 users would totally disrupt all VPN traffic for about 15 minutes by creating an authentication storm.
That is not the case any more. Authentication doesn't disrupt traffic with the latest version of OpenVPN. The only issue today is that authenticating 5 users with a delay of over 3 seconds each will take over 15 seconds.