BeyondTrust/pbis-open

group membership won't resolve completely

rbirkenhake opened this issue · 10 comments

Hello!

we've updated our environment from Likewise 6 to PBIS 8.5.0 and now a lot of users can't login on updated Systems (SLES 11 / SLES 12 systems affected). Strange behavior is that some users can and some other can't login on the same system. So PBIS is working.

Logon is restricted via AllowGroups in sshd-config and PBIS can't resolve users group membership completely. I've checked this via "list-groups-for-user"

I've ended up on taken tcpdumps to check communication with our microsoft ad-servers.
In this dumps I see starting pretty good ldap-communication in resolving the users group membership. But at some point, the client stops resolving group membership and there are still more groups to query.

My feeling in this issue is, that there must be a limit in time or count in resolving users group-membership. How can I increase this limit? I found some hard coded variables like "dwMaxEnumCount" or "MAX_NUM_GROUPS " in the code. Is it a chance in increasing this values? I take every clue in this issue!

Thanks!
Robin

Can you get me some rough numbers?

  1. Total user count
  2. Total group count
  3. max amount of groups a user would be in

Some other troubleshooting steps:

  1. lsa authenicate-user
  2. check lsass logs
    a. /opt/pbis/bin/lwsm set-log-target lsass - file /tmp/bvt-pbis-lsass.log
    b. /opt/pbis/bin/lwsm set-log-level lsass - verbose

Lastly it's best to download and try the latest version 8.5.2.262 from here https://github.com/BeyondTrust/pbis-open/releases/tag/8.5.2

I've checked our issue with Version 8.5.2.265 - same Problem

Here are some numbers:

  1. ~1300 total user count
  2. A lot of - hard to count. I think more than 1000
  3. 100 - 275 groups - depends on position. But problem exists with users only having 75 groups and other users with more than 200 groups can Login in.

lsa authenicate-user is successful

lsass log with a user, which groups are not complete:

20161212144252:VERBOSE:lsass: Permission granted for (uid = 0, gid = 0, pid = 5946) to open LsaIpcServer
20161212144252:VERBOSE:lsass-ipc: (session:30818254939dc389-ae2536e25b5c4da8) Accepted
20161212144252:VERBOSE:lsass: Cache entry for user's group membership for sid S-1-5-21-3599383493-998250424-2266437282-1272 is incomplete
20161212144252:VERBOSE:lsass: Did not find object by SID 'S-1-5-21-3599383493-998250424-2266437282-13481'
20161212144252:VERBOSE:lsass: Did not find object by SID 'S-1-5-21-3599383493-998250424-2266437282-16195'
... some more sids ...
20161212144252:VERBOSE:lsass: Did not find object by SID 'S-1-5-21-3599383493-998250424-2266437282-13389'
20161212144252:VERBOSE:lsass-ipc: (session:30818254939dc389-ae2536e25b5c4da8) Dropping: LWMSG_STATUS_PEER_CLOSE

I've checked the mentioned sid's via find-by-sid. PBIS and old Likewise Systems can't resolve them. But all SIDs exists in AD and are "universal distribution" groups for mailing.

I'm having this same issue. When the cache is working my user has 113 groups. when it gets messed up im only seeing 19 groups from /opt/pbis/bin/list-groups-for-user myuser

I can mess it up by running: sudo /opt/pbis/bin/ad-cache --delete-user --name
list-groups then shows only the small subset of expected groups

upon: /opt/pbis/bin/lsa authenticate-user --user myuser

I see this in my log:-
20161214045456:VERBOSE:lsass: The user group membership information for user MYDOMAIN\myuser does not match what is in the cache, because the cache contains 19 memberships, but the pac contains 113 memberships. The group membership now needs to be compared against LDAP.

lsass then does a refresh and is working again. So the issue seems to occur on cold cache. Like maybe its not loading the groups on first reference after ad-cache-delete

@repudi8or: Thx for sharing. But I don't have this kind of error in my log

Usually before I check groups /opt/pbis/bin/list-groups-for-user I delete the cache /opt/pbis/bin/ad-cache --delete-user --name of the user - but resolved groups are still the same and are not complete

Strange thing is - our domain contains two forests and everytime PBIS is using GCs out of this forests (check via /opt/pbis/bin/get-status, resolved groups are different and are also not complete. But in this case our important groups get resolved and users can login.

So is there a setting to control which DC and GC PBIS should use? I found only the setting to blacklist DCs.

We have plans for a white list which will come at a later date

Are sites used your forest?
Do you have a GC on all your DC?

You could try the config option NssGroupMembersQueryCacheOnly and sent it to false, but this will increase load on the DC.

Yes, we have sites in our forests and systems are in the correct site and every DC is also GC

Option NssGroupMembersQueryCacheOnly is already set to false but it doesn't help.

Question about adcache, which is located in /var/lib/pbis/db/ - there is only one file for the joined domain. We have trusts to other domains, but there is no cache-file - is this a problem?

I can't get a working environment with PBIS 8.5.2 - from my point of view it will be great, when there will be an option to whitelist DCs or to deacitivate the use of GCs (behavior like Likewise 6 - GCs don't get use)

So we change back to Likewise 6.0.8305. This Version also has problems and crashes sometimes with segmentation faults, but it can resolve group-membership completly.

We uploaded a beta for 8.5.3 today with a fix for group membership issues https://github.com/BeyondTrust/pbis-open/releases. We'd be interested to know whether it also resolves this issue.

I've seen your commit e99411e for a couple of days and compiled my own version.

It is working much better! Thanks for Fixing!!!

Thanks for the confirmation. Fix released in 8.5.3