RIPAGlobal/scimitar

filters with uri-prefixed attribute names don't work

gsar opened this issue · 7 comments

gsar commented

The ABNF in the SCIM RFC requires filters to allow attributes to be fully-qualified (i.e. spelled with their URI prefixes), but that doesn't seem to work in filter expressions like it does within patch ops. Only the unqualified form seems to work in filters, but SCIM implementations like Microsoft Entra appear to only support filters that have the fully qualified name. Potentially an easy fix.

curl -H 'Authorization: Bearer deadbeef' -X GET 'https://example.com/scim/Users?filter=urn:ietf:params:scim:schemas:extension:enterprise:2.0:User:employeeId+eq+%22gsar%22' | jq .
{
  "schemas": [
    "urn:ietf:params:scim:api:messages:2.0:Error"
  ],
  "detail": "The specified filter syntax was invalid, or the specified attribute and filter comparison combination is not supported:\nCan't lex input here ':ietf:params:scim:schemas:extension:enterprise:2.0:User:employeeId eq \"gsar\"'",
  "status": "400",
  "scimType": "invalidFilter"
}
pond commented

Looking into it...

It's wild how every person describing an issue with Entra has totally different payloads coming from it. There seem to be no two the same. How on earth Microsoft have managed to apparently independently reimplement SCIM over and over in different ways, and what it is that makes them "choose" which one of those to invoke, is anyone's guess. It's not like Microsoft don't have prior examples of doing weird stuff but this is definitely towards the top of the "most strange" list!

pond commented

The square bracket filter form of this is a total pig. Unless you know what schema IDs to expect, you can't figure out what's what. Consider:

'urn:ietf:params:scim:schemas:extension:enterprise:2.0:User:employeeId eq "gsar"
'urn:ietf:params:scim:schemas:extension:enterprise:2.0:User[employeeId eq "gsar"]

Now, a bit of background info first - Scimitar has a widespread internal limitation that 'flattens' (i.e. removes) schema IDs everywhere, which is fine - so long as no two extensions have the same attribute name, of course, since by removing the schema ID during e.g. PATCH path traversal we lose what amounted to the namespace that the schema ID provided. For the standardised SCIM extensions that's certainly OK as far as I'm aware. Fixing this limitation would be a tonne of work. to overhaul large amounts of how SCIM attribute maps are defined and how things like the PATCH parser works (but would be a good idea to do, one day!). If that happened, I'd consider it enough to bump the gem's major version number and expect client code to require significant updates to match Scimitar's new arising API requirements.

In v1/v2, though, the way we can handle that query filter above is to just remove the schema IDs from it. But how do we know which part is schema ID, and which part is attribute name? If we didn't know what schema IDs are "valid" in the query, all we can do is parse out the URN and assume - in the above case as an example - that if we hit a "[", then the thing after is an attribute and everything before is the schema ID; or if we hit a space before seeing a [, then the last colon in the URN separates the schema ID from attribute.

Given that, we can remove what we now recognise as the schema ID, yielding:

'employeeId eq "gsar"
'[employeeId eq "gsar"]

...and post-process that last example to replace the square brackets with parentheses (so that we don't need to worry about any subsequent operators that might be present in the filter - notably, things like or).

So far so good... But SCIM being SCIM, it's super complex with 50 ways to do everything instead of keeping it a lean spec with just one way! This means we can also split an attribute path with a "[". We see that done routinely for things like this:

emails.type eq "work"
emails[type eq "work"]

For the schema ID equivalent of that, let's say we wanted to match a manager's displayName. There are at least three ways to do that.

'urn:ietf:params:scim:schemas:extension:enterprise:2.0:User:manager.displayName eq "gsar"
'urn:ietf:params:scim:schemas:extension:enterprise:2.0:User:manager[displayName eq "gsar"]
'urn:ietf:params:scim:schemas:extension:enterprise:2.0:User[manager.displayName eq "gsar]"

Without knowing how much of the colon-separated part is schema ID and which bit starts the attribute path, then the first and third examples above work with the logic described for the employeeId case. But of course, that middle one is broken. Our simple parser could not possibly know that the schema ID was not supposed to be urn:ietf:params:scim:schemas:extension:enterprise:2.0:User:manager, and would thus normalise the form to an incorrect displayName eq "gsar" instead of the correct manager.displayName eq "gsar".

Currently the only thing I can thing of - bearing in mind that without breaking API changes, Scimitar's GET request filter parser doesn't know what SCIM resource class is being queried - is to enumerate known schema extensions, grab the IDs, and use those. If a query came in with an unrecognised schema ID, it'd be left in place and the filter wouldn't find anything (or could even raise an error) but we can assume that if it was trying to query attributes in schemas identified by IDs which the Scimitar application knows nothing about, it was never going to successfully find anything anyway.

pond commented

...and I've added #130 to track that, along with including a README.md  note about the limitation in the docs that talk about extension schemas, within the PR I'm building now.

pond commented

Right, with a bit of luck, #131 should do the trick, though I don't have a real-world Entra instance to test it with.

pond commented

Fix released in https://rubygems.org/gems/scimitar/versions/2.8.0. I'll backport to V1 within a week or two.

gsar commented

@pond a couple of notes / observations:

  • microsoft appears to have a bunch of fix featureflags they will enable selectively for customers who run into their SCIM compliance issues. it beats me why they don't release these things for general availability to keep everyone on the same page (perhaps it has something to do with the fixes "breaking" people who may be relying on the prior behaviors). just noting that this could be contributing to the variety of behaviors people report.
  • for handling the bracketed filters (schema[arbitrary filter] form), it can be simpler to normalize that to the schemaless non-bracketed form. so schema[employeeId eq "stuff"] just becomes schema:employeeId eq "stuff". i haven't run into this usage by microsoft, so it would be fine to not bother with supporting this at this stage.

thanks for working on these fixes, i'll try them out soon and report back in a week or two.

pond commented

Now available for Rails 6 via v1.10.0 too.