Database inconsistencies for scim / saml users between spar, brig?
fisx opened this issue · 1 comments
In rare cases after production availability issues, you may get 409 conflict
responses for creating new users. Searching for those users with curl on brig or on ES will yield no results.
The 409 conflict can have two possible causes:
- SAML NameID (externalId in scim, usually email address); symptom: the error message contains the phrase "externalId is already taken".
- Wire handle (userName in scim); symptom: error message contains the user handle in the reason phrase.
The first one is more likely. To confirm, talk to spar's cassandra:
$ /opt/cassandra/bin/cqlsh $(hostname -i) | tee table.dump
> select * from spar.user where issuer='<entity id of the IdP>'
Now press <enter>
a few times until you've seen the entire output, then:
$ grep $externaiId table.dump
If this doesn't yield anything, you don't have ruled out case 1.
If it does yield one line, you have a user id. If that user doesn't exist in brig, you have confirmed 1.
Work-around:
delete * from spar.user where issuer='<what you entered above>' and sso_id='<what you found above>'
Fix: coming up!
New data point on the context in which 409 conflict
was received:
- send a few scim user search and post requests
- see brig exhaust its cpu and become unresponsive
- phantom appears
I don't know how to explain this, it seems these events should result in a user record on brig, but not on spar, and not vice versa.