wireapp/wire-server

Database inconsistencies for scim / saml users between spar, brig?

fisx opened this issue · 1 comments

fisx commented

In rare cases after production availability issues, you may get 409 conflict responses for creating new users. Searching for those users with curl on brig or on ES will yield no results.

The 409 conflict can have two possible causes:

  1. SAML NameID (externalId in scim, usually email address); symptom: the error message contains the phrase "externalId is already taken".
  2. Wire handle (userName in scim); symptom: error message contains the user handle in the reason phrase.

The first one is more likely. To confirm, talk to spar's cassandra:

$ /opt/cassandra/bin/cqlsh $(hostname -i) | tee table.dump
> select * from spar.user where issuer='<entity id of the IdP>'

Now press <enter> a few times until you've seen the entire output, then:

$ grep $externaiId table.dump

If this doesn't yield anything, you don't have ruled out case 1.

If it does yield one line, you have a user id. If that user doesn't exist in brig, you have confirmed 1.

Work-around:

delete * from spar.user where issuer='<what you entered above>' and sso_id='<what you found above>'

Fix: coming up!

fisx commented

New data point on the context in which 409 conflict was received:

  • send a few scim user search and post requests
  • see brig exhaust its cpu and become unresponsive
  • phantom appears

I don't know how to explain this, it seems these events should result in a user record on brig, but not on spar, and not vice versa.