Allow for easily matching rules using path prefixes

Question

Allow for easily matching rules using path prefixes

davidspek opened this issue 2 years ago · 8 comments

Preflight checklist

I could not find a solution in the existing issues, docs, nor discussions.
I agree to follow this project's Code of Conduct.
I have read and am following this repository's Contribution Guidelines.
This issue affects my Ory Network project.
I have joined the Ory Community Slack.
I am signed up to the Ory Security Patch Newsletter.

Context and scope

Currently rules are matched to an incoming request based on regex or glob pattern matching. Since only a single rule is allowed to match a request, the regex or glob patterns must be very precise and become increasingly complex as different paths need to be matched. While the used regex library supports negative lookahead, creating rules where requests are matched based on path prefixes is still difficult since you'd need to provide negative matches for all other rule regexes with the same base path. As glob doesn't support negative lookahead doing path prefix based routing is not possible. Even when using regex patterns with negative lookahead, the issue that arises is that the end user ends up needing to manage a system with the state of all rules to then be able to create rules with the appropriate regexes including negative lookahead patterns for all other rules. Since the most common type of routing people likely would want to implement is prefixed based routing, the fact that doing so with oathkeeper is difficult and error prone is in my views its biggest drawback.

Also see #441.

Goals and non-goals

Goals:

Allow for easily matching routing rules using path prefixes.

Non-goals:

Allow for using regex in the scheme section of the URL, since that would make it much more difficult to enter rules into the Trie.
Allow for regex or glob patterns in places other in the host section of the URL, since making a Trie compatible with pattern matching is likely very difficult if not impossible.
Allow for regex or glob patterns in places other than the last section of the path, since making a Trie compatible with pattern matching is likely very difficult if not impossible.

The design

In order to allow for easily matching route prefixes we can make use of a Trie Data Structure. These are commonly used for things like search engines where you'd want to quickly find entries that contain a word without looping through all entries within the database. We will use a Trie to find the oathkeeper rule with the longest matching prefix. Since the time complexity is O(n) the matching of rules should be very fast, and likely much faster than what is currently being done since currently oathkeeper loops through each rule to compile a list of matches.

Here is an example of Trie that will be used:

Entering rules into the Trie.

To simplify and speed up matching the first node in our Trie will be the protocol used. This can be either HTTP or GRPC. The next level node in the Trie are the methods (GET, POST, PUT, PATCH, DELETE). This is followed by a node containing the URL scheme (for example: https, as parsed by url.Parse) and then a node containing the host (as parsed by `url.Parse).

Now for the paths. First, we will cleanup the path and keep anything up to the first regex or glob pattern. Since we are only interested in full paths that match, we will not be entering each character of the path into the Trie. Rather, we will split the path of a rule by the / character and enter each complete path into the Trie as a node. On the last path we will add the rule object to the Trie node so we can retrieve it. Technically a Trie node can contain multiple rules if patterns are used somewhere in the paths.

Matching incoming requests

When a request comes in and prefix matching is enabled we will first search the Trie for the rule(s) that match the protocol, method, sheme, hostname and path. Next, we will evaluate the rule(s) for regex or glob pattern matching the same way that is done currently. If after the pattern matching there are still multiple rules that are matched an error will occur the same way this happens currently.

Examples

Using the above example Trie diagram the following requests would en up at the following nodes in the Trie.

Protocol	Method	Scheme	Hostname	Path	Node Number
ProtocolHTTP	GET	https	example.com	/some/path	9
ProtocolHTTP	GET	https	example.com	/some/other/path	12
ProtocolHTTP	GET	https	example.com	/something/else	8
ProtocolHTTP	GET	https	example.com	/some/thing	5
ProtocolHTTP	GET	https	example.com	/unknown/path	1
ProtocolHTTP	GET	https	example.com	/something/else/and/very/long	8
ProtocolHTTP	POST	https	example.com	/	2
ProtocolHTTP	POST	https	example.com	/some/long/path	7
ProtocolHTTP	POST	https	example.com	/some/path	11

APIs

No response

Data storage

No response

Code and pseudo-code

An implementation can be found in #1073.

Degree of constraint

No response

Alternatives considered

No response

Answer 1 · 2023-04-11T13:15:06.000Z

/cc @aeneasr since you asked for a more thorough explanation and/or design document for my PR.

Answer 2 · 2023-06-21T15:02:38.000Z

Friendly ping @aeneasr .

Answer 3 · 2023-06-21T15:23:40.000Z

Friendly ping @zepatrik maybe you can also chime in here.

Answer 4 · 2023-07-17T13:03:58.000Z

This looks pretty nice! Would it make sense to re-use the matching logic of httprouter? I think they solved this already, basically, and it's really fast!

Answer 5 · 2023-08-08T21:04:13.000Z

I haven’t looked into httprouter so I’d need to have a look at it.

Answer 6 · 2023-08-08T21:24:01.000Z

It looks like httprouter is doing the same thing, except it seems to be a bit more advanced in terms of matching. What is the same though is that wildcard matching (or in our case regex and glob) can only be done at the end of a URL. Another similarity is that they also seem to be adding the method into the Trie.

One big difference though is that they don’t insert the host name into the Trie. Rather, you’d need to create a new router (and therefore a new Trie) for each domain.

I’d need to look into the code a bit closer to see if there’s pieces we could use as a module or if they can otherwise be adapted into this codebase. Since it doesn’t look like that project is super active and that the Trie structure we’d use is like a bit different I’m leaning towards the latter.

Answer 7 · 2023-08-08T21:37:23.000Z

After looking at the code it seems very similar to what I’ve already implemented, and I don’t think we’d be able to reuse it as is. However, it probably does have some useful bit that can be used to improve the code I’ve already written.

So @aeneasr, my question now is how can we best try to move this forward?

Answer 8 · 2024-08-08T00:24:20.000Z

Hello contributors!

I am marking this issue as stale as it has not received any engagement from the community or maintainers for a year. That does not imply that the issue has no merit! If you feel strongly about this issue

open a PR referencing and resolving the issue;
leave a comment on it and discuss ideas on how you could contribute towards resolving it;
leave a comment and describe in detail why this issue is critical for your use case;
open a new issue with updated details and a plan for resolving the issue.

Throughout its lifetime, Ory has received over 10.000 issues and PRs. To sustain that growth, we need to prioritize and focus on issues that are important to the community. A good indication of importance, and thus priority, is activity on a topic.

Unfortunately, burnout has become a topic of concern amongst open-source projects.

It can lead to severe personal and health issues as well as opening catastrophic attack vectors.

The motivation for this automation is to help prioritize issues in the backlog and not ignore, reject, or belittle anyone.

If this issue was marked as stale erroneously you can exempt it by adding the backlog label, assigning someone, or setting a milestone for it.

Thank you for your understanding and to anyone who participated in the conversation! And as written above, please do participate in the conversation if this topic is important to you!

Thank you 🙏✌️