PowerShell/Polaris

RFC 001: Path Matching/Resolution Strategy

Tiberriver256 opened this issue · 18 comments

The original implementation of Polaris was that the path requested by the client had to equal the path declared by the New-PolarisRoute function.

The current implementation takes a matching approach. If the path declared by New-PolarisRoute matches the beginning of the requested path the ScriptBlock will execute.

The current implementation of a more flexible path matching strategy allows for scenarios like the built-in file browser where the remainder of the URL can be assumed to be a relative path to the local file but it can be argued that it is too flexible as URLs that are technically "wrong" will match and the ScriptBlock may execute, as mentioned in #116.

We would like to ask for input from the community on a proposed solution currently in Pull Request #119. The proposal is to match the path strategy of Express.js.

This means we would support RegularExpressions for paths and named parameters within the path:

Example paths:

  1. /hello - Would get converted to the following regular expression: ^/hello$. This would guarantee hello would be the only match and return to the original implementation.
  2. /hello.* - Would get converted to the following regular expression: ^/hello.*$. This would match the current implementation and allow for any all requests that begin with hello to trigger the desired scriptlbock
  3. /hello/:firstname/:lastname - Would get converted to the following regular expression: ^/hello/(?<firstname>.*)/(?<lastname>.*). The $Matches from the regular expression would then be converted to a PSCustomObject and be made available within the ScriptBlock as something like $Request.Parameters

As this is a breaking change we would love feedback and additional suggestions on the implementation. Thanks @TimCurwick for the feedback on the PR to bring this out for a more open discussion.

*EDIT:
This would introduce the following breaking change. The below request URL would match in the current implementation and would no longer match in the new implementation.

Route path: /flights
Request URL: http://localhost:3000/flights/myspecialflight

Full set of examples taken from the express.js documentation and converted to my idea of what it would look like for Polaris:

This route path will match requests to the root route, /.

New-PolarisRoute -Path "/" -ScriptBlock {
   $Response.Send('root')
}

This route path will match requests to /about.

New-PolarisRoute -Path "/about" -ScriptBlock {
   $Response.Send('about')
}

This route path will match requests to /random.text.

New-PolarisRoute -Path "/random.text" -ScriptBlock {
   $Response.Send('random.text')
}

Here are some examples of route paths based on string patterns.

This route path will match acd and abcd.

New-PolarisRoute -Path "/ab?cd" -ScriptBlock {
   $Response.Send('ab?cd')
}

This route path will match abcd, abbcd, abbbcd, and so on.

New-PolarisRoute -Path "/ab+cd" -ScriptBlock {
   $Response.Send('/ab+cd')
}

This route path will match abcd, abxcd, abRANDOMcd, ab123cd, and so on.

New-PolarisRoute -Path "/ab*cd" -ScriptBlock {
   $Response.Send('/ab*cd')
}

This route path will match /abe and /abcde.

New-PolarisRoute -Path "/ab(cd)?e" -ScriptBlock {
   $Response.Send('/ab(cd)?e')
}

Examples of route paths based on regular expressions:

This route path will match anything with an “a” in it.

New-PolarisRoute -Path [RegEx]::New("a") -ScriptBlock {
   $Response.Send('a')
}

This route path will match butterfly and dragonfly, but not butterflyman, dragonflyman, and so on.

New-PolarisRoute -Path [RegEx]::New(".*fly$") -ScriptBlock {
   $Response.Send('.*fly$')
}

Route parameters

Route parameters are named URL segments that are used to capture the values specified at their position in the URL. The captured values are populated in the req.params object, with the name of the route parameter specified in the path as their respective keys.

Route path: /users/:userId/books/:bookId
Request URL: http://localhost:3000/users/34/books/8989
$Request.parameters: { "userId": "34", "bookId": "8989" }

To define routes with route parameters, simply specify the route parameters in the path of the route as shown below.

New-PolarisRoute -Path "/users/:userId/books/:bookId" -ScriptBlock {
   $Response.Send($Request.Parameters)
}

The name of route parameters must be made up of “word characters” ([A-Za-z0-9_]).

Since the hyphen (-) and the dot (.) are interpreted literally, they can be used along with route parameters for useful purposes.

Route path: /flights/:from-:to
Request URL: http://localhost:3000/flights/LAX-SFO
$Request.Parameters: { "from": "LAX", "to": "SFO" }
Route path: /plantae/:genus.:species
Request URL: http://localhost:3000/plantae/Prunus.persica
$Request.Parameters: { "genus": "Prunus", "species": "persica" }

To have more control over the exact string that can be matched by a route parameter, you can append a regular expression in parentheses (()):

Route path: /user/:userId(\d+)
Request URL: http://localhost:3000/user/42
$Request.Parameters: {"userId": "42"}

This looks like a pretty cool change, I've just started looking at Polaris as a possible alternative to Universal Dashboard for hosting some REST APIs (well as a middle layer to talk to another API that doesn't support CORS).

I'm only doing GETs so I can see some use for this for me, I'm interested to see what other people are doing with Polaris that this might break for them.

@Tiberriver256 you know you have my FULL support. This change looks soooo awesome. I can't wait 😄

@ChrisLGardner I love UD but I'll say Polaris does beat it on its size. Polaris uses HttpListener (for now at least...) which is built in where UD bundles up ASP.NET (for 5.1) and ASP.NET Core (for 6+) which can make it a bit like drinking from a firehose. But UD is soooo powerful.

@ChrisLGardner - HttpListener can do https. I think we'll need a few changes possibly and we'll have to document the setup. Tracking that in #107.

I also don't expect us to break any use cases but I think some paths might have to be updated in code if they were expecting it to continue to be just a match. We will leave this out here for the full two weeks before accepting a PR just because it sounds like we need to be sensitive to any use cases out there.

We are also hoping to see if we can swap in Kestrel for Httplistener if we can keep the size low (in other words, not bring in all of ASP.NET if possible) that's tracked in #12.

Here is the functionality I want/need:

New-PolarisRoute -Path '/Snoobits' -Scriptblock {}

Matches http://localhost/Snoobits
Matches http://localhost/Snoobits/Careers
Matches http://localhost/Snoobits/Careers/Application.html
Does not match http://localhost/Snoobits2

If multiple routes are defined in the path, the route closest to the leaf takes precedence.

New-PolarisRoute -Path '/Snoobits' -Scriptblock {} -Exactmatch

Prevents partial matches.

We can, of course, integrate all of these functionalities, but I don't want to clutter it up with unnecessary features that make it more complicated or challenging to use in real life.

I agree with adding handling for route parameters (though not necessarily the approach), but what is the use case for string and regex matching?

I also need simple matching to be case insensitive, and for more efficient implementation, I would argue for there to be no exceptions.

So in the proposal here you could achieve your goal with this command if you didn't need / want to capture any parameters:

New-PolarisRoute -Path '/Snoobits/.*' -Scriptblock {}

If you wanted to capture the remainder of the path for use in your scriptblock you would write it as

New-PolarisRoute -Path '/Snoobits/:RemainingPath' -Scriptblock {}

Which would produce the following results:

Request URL: http://localhost:3000/Snoobits/Careers/Application.html
$Request.Parameters: { "RemainingPath": "Careers/Application.html" }
Request URL: http://localhost:3000/Snoobits/Careers
$Request.Parameters: { "RemainingPath": "Careers" }

I'm with you on the case sensitive thing. I had not thought of that but PowerShell being case insensitive on a -match is somewhat non-typical. We could default it to use -cmatch and add a parameter for -CaseInsensitiveMatch or something of that sort?

To answer your question about the benefits / use case of going to regular expressions and parameters for path matching. The parameters is really just taking advantage of regular expressions as .net allows named captures. The RegEx though guarantees that the path matching strategy can really be bent to anyone's use case. If you can think of some way you want to match / not-match you can do it in regex with enough Googling, but it's still friendly enough to just drop in a "/hello-world" path and have a match for "http://localhost:3000/hello-world" and not match "http://localhost:3000/hello-world1235" as you might typically expect.

We don't want to build something that can do anything. There's different project for that, named "PowerShell". We need something that is optimized to meet the needs of the people that will be using it.

If we end up with pattern matching functionality as a side effect of how we implement design requirements, that's fine, but I don't think it is itself a design requirement.

If you are using matching as you've proposed it, how would you handle precedence?

Scenario:
Route A: /Reports
Route B: /Reports/SQLServer1/Instance1
Route C: /Reports/SQLServer1/Instance2

/Reports goes to A
/Reports/SQLServer1 goes to A
/Reports/SQLServer1/Instance1 goes to B
/Reports/SQLServer1/Instance1/Log1 goes to B

Here’s what I propose.

Syntax

#  Exact match
New-PolarisRoute -Scriptblock {} -Route /Folder1

#  Match descendant objects
New-PolarisRoute -Scriptblock {} -Route /Folder1 -MatchChildItems

#  Regex match
New-PolarisRoute -Scriptblock {} -Match /Folder1/.*fly$

#  PowerShell string pattern match
New-PolarisRoute -Scriptblock {} -Like /Folder1/SQL*

#  Route parameters
New-PolarisRoute -Scriptblock {} -Route /Folder1 -RouteParameter /:firstname/:lastname 

Objects
Replace (and possibly rename) $Polaris.ScriptRoutes with array of PolarisRoute objects.

class PolarisRoute {
    [String]$Method
    [String]$Route
    [String]$RouteParameter
    [Scriptblock]$Scriptblock
    [Boolean]$MatchChildItems
    [Boolean]$IsRegexMatch
    [Boolean]$IsPatternMatch } 

Add $Polaris property [Hashtable]$Routes (or some better name). As routes are added to array $ScriptRoutes, they are also added to a hashtable for fast, case insensitive exact matching.
$Routes.Add( "$Method:$Route", $_ )`

Matching logic

#  Exact match, case insensitive
If ( $Route["$Method`:$Route"] )
	{ $MatchingRoute = $Route["$Method`:$Route"] }
Else
    {
	#  Child item match, case insensitive
    ForEach ( $RouteCandidate in <# from $Route to "/" by atom#> )
        {
        If ( $Route["$Method`:$RouteCandidate"].MatchChildItems )
            {
            $MatchingRoute = $Route["$Method`:$RouteCandidate"]
            break
            }
        }
    }

If ( -not $MatchingRoute )
    {
    #  Regex and pattern matching routes in order added to Polaris object
    ForEach ( $RouteCandidate in $ScriptRoutes ) 
        {
        If ( $RouteCandidate.IsRegexMatch )
            {
            If ( $Route -match $RouteCandidate.Route )
                {
                $MatchingCandidate = $RouteCandidate
                break
                }
            }
        If ( $RouteCandidate.IsPatternMatch )
            {
            If ( $Route -like $RouteCandidate.Route )
                {
                $MatchingCandidate = $RouteCandidate
                break
                }
            }
        }
    }

If ( $MatchingRoute.RouteParameter )
    {
    <# Process route parameters #>
    } 

Thanks for the alternate solution proposal, I will have to mull this over a little bit.

I don't think the goal is to re-create PowerShell in path matching (I'm guessing that's Internet sarcasm though :) ) but I do somewhat appreciate the comparison. My goal would be to create something that is simple and easy to use but also scales well to really complex scenarios which I feel like expressJS accomplished really well with their path matching strategy as there have been a lot of tools successfully built on top of it.

To answer your question on precedence. The example you gave wouldn't really have any worries about precedence because they don't have any patterns or regular expressions.

Scenario:
Route A: /Reports
Route B: /Reports/SQLServer1/Instance1
Route C: /Reports/SQLServer1/Instance2

/Reports goes to A
/Reports/SQLServer1 gets 404 as it doesn't exist
/Reports/SQLServer1/Instance1 goes to B
/Reports/SQLServer1/Instance1/Log1 gets 404 as it doesn't exist

This should be the default behavior for basic strings entered in as paths as it is the default behavior in the frameworks that I have tried and is pretty much the expected behavior. It matches what I tell it to match and nothing more. If I ask it to match "/Reports" and it matches "/Reports/SQLServer1" I think that would be considered unexpected behavior by most people. @tylerl0706 feel free to correct me if I'm wrong or jump in on this too.

In a scenario with multiple regular expressions or string patterns with a wildcard at the end my PR is handling the precedence by running the longest (most specific) first, so if you wanted to use all wildcard string patterns like this:

Scenario:
Route A: /Reports*
Route B: /Reports/SQLServer1/Instance1*
Route C: /Reports/SQLServer1/Instance2/logs/*

This would be your experience:

/Reports goes to A
/Reports/SQLServer1 goes to A
/Reports/SQLServer1/Instance1 goes to B
/Reports/SQLServer1/Instance1/myrandomsite goes to B
/Reports/SQLServer1/Instance1/logs/1 goes to C

Question on your proposal there, what is intent of the -MatchChildItems switch? It's a little tough to tell from the code.

Just want to chime in and say that I think @Tiberriver256 is right. Most webservers (thinking about Express, Flask, Nancy (I think)) would not allow this:

Scenario:
Route A: /Abc

/Abc/Foo to match A

For the other web servers this would throw a 404 back.

I do, however, really like the compromise of using the wildcard which is very familiar to PowerShell users.

@Tiberriver256's idea allows for simple route creation, while also giving complexity to users who need it.

That said, I really value @TimCurwick's feedback and thoughts. He's done a ton of awesome things for Polaris. What are your thoughts given this addition information?

To clarify a little bit, the original proposal does match expressJS string patterns which include the * wildcard familiar to PowerShell. See the section on sample string patterns for the three special characters supported in their string patterns.

I'm going to go ahead and post some thoughts and questions on the syntax as I really don't think we're too far off from each other, if I'm reading it right. I'll list the two syntaxes side-by-side for some easy comparison to do the same thing, it may help if you're not really familiar with the syntax and my documentation was just confusing.

#  Exact match
Proposed Alternate -> New-PolarisRoute -Scriptblock {} -Route /Folder1
ExpressJS Syntax -> New-PolarisRoute -Scriptblock {} -Path /Folder1

#  Match descendant objects
Proposed Alternate -> New-PolarisRoute -Scriptblock {} -Route /Folder1 -MatchChildItems
ExpressJS Syntax -> New-PolarisRoute -Scriptblock {} -Path /Folder1*

#  Regex match
Proposed Alternate -> New-PolarisRoute -Scriptblock {} -Match /Folder1/.*fly$
ExpressJS Syntax -> New-PolarisRoute -Scriptblock {} -Path [RegEx]::new("/Folder1/.*fly$")

#  PowerShell string pattern match
Proposed Alternate -> New-PolarisRoute -Scriptblock {} -Like /Folder1/SQL*
ExpressJS Syntax ->  New-PolarisRoute -Scriptblock {} -Path /Folder1/SQL*

#  Route parameters
Proposed Alternate -> New-PolarisRoute -Scriptblock {} -Route /Folder1 -RouteParameter /:firstname/:lastname
ExpressJS Syntax -> New-PolarisRoute -Scriptblock {} -Path /Folder1/:firstname/:lastname

It seems the main difference is the addition of extra parameter names to hopefully clarify the purpose of each area of the path?

I agree actually, it might be clearer instead of using [RegEx]::new() to supply a second parameter set for Regular Expression matches so instead of putting everything in Path and type checking to determine if the path is of type regular expression or string to just have a -RegExPath parameter maybe?

Separating the route from route parameters I don't see as being a good move. It won't necessarily be clear on how to write a path like /Folder1/:name/books/:id where there needs to be a mix of parameters and static names. What would be the main value you are going for in splitting it out?

If the -MatchChildItems is the same as adding a wildcard to the end, is there something unique it adds?

Loosely inspired by Flask's url_for, this suggestion takes a totally different direction from the current discussion. This is similar to what is suggested in #46.

My suggestion is to not have much compexity in the path matching, or indeed give the user much control over it, but rather let it be decided by the implementation of the underlying functionality.

Bear with me - this isn't thought through entirely. I figured I'd rather present the suggestion in its current state for potential discussion, since there is also a heavy element of "what should Polaris do and be?", which I don't really feel qualified to have a strong opinion on.

An example of what this could look like:

Function Get-Foo {
	[CmdletBinding()]
	Param(
		[Parameter()][string]$bar,
		[Parameter()][string]$baz
	)

	Write-Output $bar
	Write-Output $baz
}

New-PolarisGetRoute -Path "/getfoo" -Expose "Get-Foo"

Which, under the hood, should be equivalent to creating these routes:

New-PolarisGetRoute -Path "/getfoo" -ScriptBlock {
	$Response.Send(Get-Foo)
}

New-PolarisGetRoute -Path "/getfoo/:bar" -ScriptBlock {
	$Response.Send(Get-Foo bar)
}

New-PolarisGetRoute -Path "/getfoo/:bar/:baz" -ScriptBlock {
	$Response.Send(Get-Foo bar baz)
}

Using named, rather than positional, parameters could be supported by query strings.

/getfoo?bar=a&baz=b

Resulting in:

$Response.Send(Get-Foo -bar "a" -baz "b")

(No clue what the reaction to "/getfoo?bar=a&baz=b/a/b" should be like).

Path matching would be simplified to basically splitting the request on "/" (no regex needed) for positional parameters, and using query strings (either from query or body) for named parameters.

Pros:

  • Structure defined by implementation tickles me in the right places.
  • Path matching is less complex
  • Intuitive (well, to me anyways)
  • If in a risky mood native Powershell cmdlets can be exposed too

Cons:

  • Serious thought would have to go into sanitizing requests to avoid spurious stuff like "/getfoo/a/b;Destroy-Server" and "/getfoo/a/b$(Destroy-Server)"
  • Takes power away from the user (but leaves him to worry about the right things?)
  • Something like "/getmywikipage/some/variable/length/of/segments/edit" can not be routed with a catch-all like '-Path /getmywikipage/*/edit'. You'd instead have to resort to URL encoding.
  • Breaking change.

Sorry for the delay! I like you're idea. I will say this though... when it comes to a framework, the best thing to do is to make it easy to get started, but also allow users to beyond in customization.

I think that your suggestion could be a great addition to Polaris' path matching, rather than replacing what's proposed. If it's not too difficult, and doesn't have too much complexity, we should be able to support as many different ways to define routes as possible.

I think the Express syntax is powerful, concise, and provides a point of reference for anyone coming from the other ecosystem. I like it!

This is now in Polaris 🎉