MarcJHuber/event-driven-servers

Sporadic TLS negotiation failures with Mavis LDAP backend

jk2lx opened this issue · 6 comments

Scenario:
jenkins2.example.net wants to log into 192.168.60.102 with Tacacs nca-ro-user user which sometimes fails, sometimes succeeds.
192.168.60.102 is configured to use Tacacs server tacplusng1 (192.168.200.1) for authentication, the Tacacs server is using the LDAP server ldap1 with the Mavis plugin for authentication.

Looking at the Tacacs server (tacplusng1) logs, "TLS negotiation failed" seems to be the culprit.
On the LDAP server (ldap1), "TLS negotiation failure" is shown as well.

The problem is though that this is not a consistent issue, the same config usually works but a small percentage auth Tacacs->LDAP authentication requests fails. On one day it is about 357164 requests which get an ACK and 847 which get an ERR because of TLS negotiation failures.
What could be the issue here?

Mavis LDAP plugin cfg:
mavis module = external {
setenv LDAP_SERVER_TYPE = "generic"
setenv LDAP_HOSTS = "ldap://ldap1.example.net:389 ldap://ldap2.example.net:389"
setenv LDAP_BASE = "dc=example,dc=net"
setenv LDAP_FILTER = "(uid=%s)"
setenv LDAP_USER = "cn=admin,dc=example,dc=net"
setenv LDAP_PASSWD = PASSWD
setenv FLAG_USE_MEMBEROF = 1
setenv USE_STARTTLS = 1
exec = /usr/local/lib/mavis/mavis_tacplus-ng_ldap.pl
}

Tacacs log regarding failed authentication:
Mar 1 07:03:05 tacplusng1 tacplus[1014]: author|192.168.60.102|nca-ro-user|||admin-ro|permit|shell|
Mar 1 07:03:05 tacplusng1 tacplus[1013]: author|192.168.60.102|nca-ro-user|ssh|jenkins2.example.net|admin-ro|permit||
Mar 1 07:03:05 tacplusng1 tacplus[1014]: 192.168.60.102 looking for user nca-ro-user in MAVIS backend
Mar 1 07:03:06 tacplusng1 tacplus[1014]: 192.168.60.102 result for user nca-ro-user is ERR
Mar 1 07:03:06 tacplusng1 tacplus[1014]: 192.168.60.102 shell login for 'nca-ro-user' (realm: port1611) from jenkins2.example.net on ssh failed
[TLS negotiation failed.] (profile=admin-ro)
Mar 1 07:03:06 tacplusng1 tacplus[1014]: authen|192.168.60.102|nca-ro-user|ssh|jenkins2.example.net|shell login failed [TLS negotiation failed.
]

LDAP server logs:
Mar 1 07:03:06 ldap1 slapd[1005339]: conn=10696992 fd=14 ACCEPT from IP=192.168.200.1:38004 (IP=0.0.0.0:389)
Mar 1 07:03:06 ldap1 slapd[1005339]: conn=10696992 op=0 EXT oid=1.3.6.1.4.1.1466.20037
Mar 1 07:03:06 ldap1 slapd[1005339]: conn=10696992 op=0 STARTTLS
Mar 1 07:03:06 ldap1 slapd[1005339]: conn=10696992 op=0 RESULT oid= err=0 qtime=0.000014 etime=0.000179 text=
Mar 1 07:03:06 ldap1 slapd[1005339]: conn=10696992 fd=14 closed (TLS negotiation failure)

Hi,

I don't see what could cause that issue, the Perl script will call Net::LDAP->new() if $ldap is undefined, and on success it will run the start_tls() method.

In general, LDAP connections are long-standing, and (re-)binds are only required if the connection to the LDAP server is lost. In a perfect scenario, your connection could live forever. Alas, this apparently doesn't work.

I'd try moving to LDAPS, it could prove more stable than STARTTLS.

Cheers,

Marc

In the old Mavis module there was a FLAG_CACHE_CONNECTION which would keep up the LDAP connection:

if (!defined($flag_cacheconn) && defined($ldap)) {

But in the new Mavis module which we are using, the LDAP connection seems to be torn down every time:

Could this be the reason for the LDAP connection being re-established every time or am I interpreting the code wrongly?

Hi,

might be. At least when looking at the password change code I wonder why I started a unbind-disconnect-new-start_tls sequence where re-binding should be sufficient. On the other hand, password changes don't happen too often, so this shouldn't matter.

I guess adding some error output here (regarding lost/new LDAP connections) would be helpful here, but I'm too short on time for that at the moment. Next weekend, maybe ...

Cheers,

Marc

Hi,

hm, I think you're right, there are needless disconnects (following the bye: label). You could just try to drop the 5 lines below that, that may improve things. I need to have yet another look at that code.

Alas, this doesn't look like it could explain the TLS failure you're seeing.

Thanks,

Marc

After changing the LDAP connection to LDAPS, the issue disappeared.
Nonetheless it would be interesting to know why STARTTLS caused TLS negotiation issues and having the option to retain LDAP connections instead of tearing them down might be beneficial performance-wise under certain scenarios.

Hi,

thanks, yes, I'll update the code during the next couple of days.

Cheers,

Marc