Pattern matching spans of chars against constant strings
Opened this issue · 7 comments
Proposal
I propose that we allow pattern-matching values of type ReadOnlySpan<char>
and Span<char>
against constant strings.
match "abc123".AsSpan 1 with
| "bc123" -> printfn "Matches."
| "abc123" -> printfn "Doesn't match."
| "a" -> printfn "Doesn't match."
| _ -> printfn "Doesn't match."
let [<Literal>] SomeConstant = "abc123"
let f (span : ReadOnlySpan<char>) =
match span with
| SomeConstant -> printfn "Matches."
| _ -> printfn "Doesn't match."
let span = "abc123xyz".AsSpan ()
let lastDigit = span.LastIndexOfAnyInRange ('0', '9')
match span.Slice lastDigit with
| "" -> printfn "Ends with a digit."
| suffix -> printfn $"Ends with '{suffix.ToString ()}'."
Compare the equivalent C# proposal (implemented in C# 11).
We would do this by first automatically generating calls to System.MemoryExtensions.AsSpan
on a constant string being matched against, and then by passing the resulting span and the input span to the appropriate System.MemoryExtensions.SequenceEqual
overload.
The existing way of approaching this problem in F# is...
-
Use string slicing instead of span slicing. This allocates a new string.
match "abc123"[1..] with // Or "abc123".Substring 1, etc. | "bc123" -> printfn "Matches." | "abc123" -> printfn "Doesn't match." | "a" -> printfn "Doesn't match." | _ -> printfn "Doesn't match."
-
Use a series of
if
-then
-else
expressions instead of pattern-matching.let slice = "abc123".AsSpan 1 if slice.SequenceEqual "bc123" then printfn "Matches." elif slice.SequenceEqual "abc123" then printfn "Doesn't match." elif slice.SequenceEqual "a" then printfn "Doesn't match." else printfn "Doesn't match."
-
Define and use an active pattern like:
[<return: Struct>] let (|SequenceEqual|_|) (string: string) (span: ReadOnlySpan<char>) = if span.SequenceEqual string then ValueSome SequenceEqual else ValueNone match "abc123".AsSpan 1 with | SequenceEqual "bc123" -> "Matches." | SequenceEqual "abc123" -> "Doesn't match." | SequenceEqual "a" -> "Doesn't match." | _ -> "Doesn't match."
I almost didn't include this, because I'd rather not need to think to do this — and (in my experience) most people would not think to. Even if an equivalent active pattern were included in FSharp.Core, it would not be very discoverable.
Notes
-
We would not change the type inference rules for string patterns — the inferred type for the pattern input of a string literal pattern would still be
string
. -
As in the C# implementation, I think we would want a compile-time error for matching against
null
:let [<Literal>] NullLiteral : string = null match "abc123".AsSpan (2, 3) with | (null : string) (* This should be an error at build-time. *) -> … | NullLiteral (* This should be an error at build-time. *) -> … | _ -> …
-
Any named pattern bindings should still have the input (span) type:
match "asdf".AsSpan () with | "asdf" & span -> printfn "`span` is a `ReadOnlySpan<char>` here." | "asdf" as span -> printfn "`span` is a `ReadOnlySpan<char>` here." | span -> printfn "`span` is a `ReadOnlySpan<char>` here."
Pros and cons
The advantages of making this adjustment to F# are
- It brings the ergonomics of working with spans closer to the ergonomics already enjoyed by strings. This would help improve the legibility of performance-sensitive code that uses spans to avoid unnecessary string copying and allocations.
The disadvantages of making this adjustment to F# are
- None, except that we're moving the complexity from the programmer to the language.
Alternatives
-
Type-directed
[…]
syntax (#1086) could theoretically subsume this, although when the underlying thing being sliced is a string, I think allowing matching against string literals still has a readability advantage. C# supports both, for what it's worth:_ = "abc123".AsSpan(1, 3) switch { "bc1" => "Matches.", ['b', 'c', '1'] => "Also matches.", _ => "…" };
Extra information
Estimated cost (XS, S, M, L, XL, XXL):
- S — to the best of my understanding, since such code does not currently compile, and since I don't know of anything that making this addition to the compiler would later preclude.
Related suggestions: (put links to related suggestions here)
-
This could potentially be viewed as an extension of #849 (see discussion of
op_Implicit
in FS-1093). -
It's pretty common today to pattern-match multiple strings at once using tuple syntax, like
match "abc", "123" with | "abc", _ -> … | _, "789" -> …
If we make the change from this proposal, doing the same with spans would not work without additional changes to the compiler (compare #688, #887, #872, etc.):
match "abc".AsSpan 2, "123".AsSpan (0, 2) with | span1, span2 -> printfn "This doesn't work."
match "abc".AsSpan 2, "123".AsSpan (0, 2) with ------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ stdin(9,7): error FS0412: A type instantiation involves a byref type. This is not permitted by the rules of Common IL.
Affidavit (please submit!)
Please tick these items by placing a cross in the box:
- This is not a question (e.g. like one you might ask on StackOverflow) and I have searched StackOverflow for discussions of this issue
- This is a language change and not purely a tooling change (e.g. compiler bug, editor support, warning/error messages, new warning, non-breaking optimisation) belonging to the compiler and tooling repository
- This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it
- I have searched both open and closed suggestions on this site and believe this is not a duplicate
Please tick all that apply:
- This is not a breaking change to the F# language design
- I or my company would be willing to help implement and/or test this
For readers
If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.
I am personally not a huge fan of introducing intrinsics to the compiler it should know of to be able to optimize it. Implicit conversion looks better tbh, but will involve additional hidden conversions/allocations. Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?
will involve additional hidden conversions/allocations
Hmm, it would indeed involve hidden invocation of op_Implicit
and creation of ReadOnlySpan<char>
s, but it wouldn't involve any heap allocations.
Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?
While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using op_Implicit
to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.
Separately, while it's not exactly the same, there is already some precedent in the language for syntactic crossover between string-like things and array-like things:
// Looks like a string...
match "abc"B with
// ...But it's actually an array.
| [|0x61uy; 0x62uy; 0x63uy|] -> printfn "Matches."
| _ -> printfn "Doesn't match."
From a design perspective, it's also nice as a user when the easy thing is also the fast/correct thing.
will involve additional hidden conversions/allocations
Hmm, it would indeed involve hidden invocation of
op_Implicit
and creation ofReadOnlySpan<char>
s, but it wouldn't involve any heap allocations.
Yeah, I meant if we do type directed the other way (span -> string) for some cases, where we can't (?) do AsSpan.
Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?
While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using
op_Implicit
to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.
Yeah, it can just be somewhat confusing for some people, I suppose.
From a design perspective, it's also nice as a user when the easy thing is also the fast/correct thing.
I suppose it's a tradeoff - do exactly what user wrote (or close to it) versus try and rewrite/optimize things. We already do both, so no really huge directional change.
I suppose it's up to @dsyme now - whether we want to do more with spans (and byreflikes in general), since it's clearly where Roslyn is going (and clr in general).
I don't understand this semantically. "bc123"
appears to be a string
, and so the match appears to be checking for equality, but a string
is not a ReadOnlySpan
so equality doesn't make sense.
"bc123"
appears to be astring
, and so the match appears to be checking for equality, but astring
is not aReadOnlySpan
so equality doesn't make sense.
Yeah, I get that it "is" not a string, and that it might not "feel" F#-y if you focus on the current type implications of string literals rather than viewing them as a shape or pattern of char
s to match against — see my response above (#1351 (comment)):
While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using
op_Implicit
to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.
The whole point of spans is to serve as windows into contiguous stretches of memory — and a string literal is just special syntax for a stretch of contiguous char
s.
Sure, when a string literal is used as an object, an object is created to point at that stretch of char
s, but — as the treatment of strings and spans in C# string slicing and pattern-matching as well as the design of System.String
itself in the BCL show — strings and spans are meant to be treated as more or less interchangeable, including in pattern-matching.
I do understand that the usage of op_Implicit
, etc., would probably not be how it would have been done if the idea of spans had existed in .NET (and thus F#) from day one, and that it doesn't fit perfectly in the F# tradition, but it seems clear that testing equality between a span of char
s and a string is meant to be a meaningful operation from the .NET perspective.
So in my mind the change here doesn't imply some new type equivalence between spans and strings, but rather specifically enables treating string literals as patterns representing spans of characters against which, well Span
s or ReadOnlySpan
s of characters can be matched.