fsharp/fslang-suggestions

Pattern matching spans of chars against constant strings

Opened this issue · 7 comments

Proposal

I propose that we allow pattern-matching values of type ReadOnlySpan<char> and Span<char> against constant strings.

match "abc123".AsSpan 1 with
| "bc123"  -> printfn "Matches."
| "abc123" -> printfn "Doesn't match."
| "a"      -> printfn "Doesn't match."
| _        -> printfn "Doesn't match."
let [<Literal>] SomeConstant = "abc123"

let f (span : ReadOnlySpan<char>) =
    match span with
    | SomeConstant -> printfn "Matches."
    | _            -> printfn "Doesn't match."
let span = "abc123xyz".AsSpan ()
let lastDigit = span.LastIndexOfAnyInRange ('0', '9')

match span.Slice lastDigit with
| ""     -> printfn "Ends with a digit."
| suffix -> printfn $"Ends with '{suffix.ToString ()}'."

Compare the equivalent C# proposal (implemented in C# 11).

We would do this by first automatically generating calls to System.MemoryExtensions.AsSpan on a constant string being matched against, and then by passing the resulting span and the input span to the appropriate System.MemoryExtensions.SequenceEqual overload.

The existing way of approaching this problem in F# is...

  • Use string slicing instead of span slicing. This allocates a new string.

    match "abc123"[1..] with // Or "abc123".Substring 1, etc.
    | "bc123"  -> printfn "Matches."
    | "abc123" -> printfn "Doesn't match."
    | "a"      -> printfn "Doesn't match."
    | _        -> printfn "Doesn't match."
  • Use a series of if-then-else expressions instead of pattern-matching.

    let slice = "abc123".AsSpan 1
    
    if slice.SequenceEqual "bc123" then
        printfn "Matches."
    elif slice.SequenceEqual "abc123" then
        printfn "Doesn't match."
    elif slice.SequenceEqual "a" then
        printfn "Doesn't match."
    else
        printfn "Doesn't match."
  • Define and use an active pattern like:

    [<return: Struct>]
    let (|SequenceEqual|_|) (string: string) (span: ReadOnlySpan<char>) =
        if span.SequenceEqual string then ValueSome SequenceEqual
        else ValueNone
    
    match "abc123".AsSpan 1 with
    | SequenceEqual "bc123" -> "Matches."
    | SequenceEqual "abc123" -> "Doesn't match."
    | SequenceEqual "a" -> "Doesn't match."
    | _ -> "Doesn't match."

    I almost didn't include this, because I'd rather not need to think to do this — and (in my experience) most people would not think to. Even if an equivalent active pattern were included in FSharp.Core, it would not be very discoverable.

Notes

  • We would not change the type inference rules for string patterns — the inferred type for the pattern input of a string literal pattern would still be string.

  • As in the C# implementation, I think we would want a compile-time error for matching against null:

    let [<Literal>] NullLiteral : string = null
    
    match "abc123".AsSpan (2, 3) with
    | (null : string) (* This should be an error at build-time. *) ->| NullLiteral (* This should be an error at build-time. *) ->| _ ->
  • Any named pattern bindings should still have the input (span) type:

    match "asdf".AsSpan () with
    | "asdf" &  span -> printfn "`span` is a `ReadOnlySpan<char>` here."
    | "asdf" as span -> printfn "`span` is a `ReadOnlySpan<char>` here."
    | span           -> printfn "`span` is a `ReadOnlySpan<char>` here."

Pros and cons

The advantages of making this adjustment to F# are

  • It brings the ergonomics of working with spans closer to the ergonomics already enjoyed by strings. This would help improve the legibility of performance-sensitive code that uses spans to avoid unnecessary string copying and allocations.

The disadvantages of making this adjustment to F# are

  • None, except that we're moving the complexity from the programmer to the language.

Alternatives

  • Type-directed […] syntax (#1086) could theoretically subsume this, although when the underlying thing being sliced is a string, I think allowing matching against string literals still has a readability advantage. C# supports both, for what it's worth:

    _ = "abc123".AsSpan(1, 3) switch
    {
        "bc1" => "Matches.",
        ['b', 'c', '1'] => "Also matches.",
        _ => ""
    };

Extra information

Estimated cost (XS, S, M, L, XL, XXL):

  • S — to the best of my understanding, since such code does not currently compile, and since I don't know of anything that making this addition to the compiler would later preclude.

Related suggestions: (put links to related suggestions here)

  • This could potentially be viewed as an extension of #849 (see discussion of op_Implicit in FS-1093).

  • It's pretty common today to pattern-match multiple strings at once using tuple syntax, like

    match "abc", "123" with
    | "abc", _ ->| _, "789" ->

    If we make the change from this proposal, doing the same with spans would not work without additional changes to the compiler (compare #688, #887, #872, etc.):

    match "abc".AsSpan 2, "123".AsSpan (0, 2) with
    | span1, span2 -> printfn "This doesn't work."
      match "abc".AsSpan 2, "123".AsSpan (0, 2) with
      ------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    stdin(9,7): error FS0412: A type instantiation involves a byref type. This is not permitted by the rules of Common IL.

Affidavit (please submit!)

Please tick these items by placing a cross in the box:

  • This is not a question (e.g. like one you might ask on StackOverflow) and I have searched StackOverflow for discussions of this issue
  • This is a language change and not purely a tooling change (e.g. compiler bug, editor support, warning/error messages, new warning, non-breaking optimisation) belonging to the compiler and tooling repository
  • This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it
  • I have searched both open and closed suggestions on this site and believe this is not a duplicate

Please tick all that apply:

  • This is not a breaking change to the F# language design
  • I or my company would be willing to help implement and/or test this

For readers

If you would like to see this issue implemented, please click the 👍 emoji on this issue. These counts are used to generally order the suggestions by engagement.

I am personally not a huge fan of introducing intrinsics to the compiler it should know of to be able to optimize it. Implicit conversion looks better tbh, but will involve additional hidden conversions/allocations. Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?

will involve additional hidden conversions/allocations

Hmm, it would indeed involve hidden invocation of op_Implicit and creation of ReadOnlySpan<char>s, but it wouldn't involve any heap allocations.

Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?

While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using op_Implicit to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.

Separately, while it's not exactly the same, there is already some precedent in the language for syntactic crossover between string-like things and array-like things:

// Looks like a string...
match "abc"B with
// ...But it's actually an array.
| [|0x61uy; 0x62uy; 0x63uy|] -> printfn "Matches."
| _ -> printfn "Doesn't match."

From a design perspective, it's also nice as a user when the easy thing is also the fast/correct thing.

will involve additional hidden conversions/allocations

Hmm, it would indeed involve hidden invocation of op_Implicit and creation of ReadOnlySpan<char>s, but it wouldn't involve any heap allocations.

Yeah, I meant if we do type directed the other way (span -> string) for some cases, where we can't (?) do AsSpan.

Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?

While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using op_Implicit to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.

Yeah, it can just be somewhat confusing for some people, I suppose.

From a design perspective, it's also nice as a user when the easy thing is also the fast/correct thing.

I suppose it's a tradeoff - do exactly what user wrote (or close to it) versus try and rewrite/optimize things. We already do both, so no really huge directional change.

I suppose it's up to @dsyme now - whether we want to do more with spans (and byreflikes in general), since it's clearly where Roslyn is going (and clr in general).

I don't understand this semantically. "bc123" appears to be a string, and so the match appears to be checking for equality, but a string is not a ReadOnlySpan so equality doesn't make sense.

"bc123" appears to be a string, and so the match appears to be checking for equality, but a string is not a ReadOnlySpan so equality doesn't make sense.

Yeah, I get that it "is" not a string, and that it might not "feel" F#-y if you focus on the current type implications of string literals rather than viewing them as a shape or pattern of chars to match against — see my response above (#1351 (comment)):

While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using op_Implicit to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.

The whole point of spans is to serve as windows into contiguous stretches of memory — and a string literal is just special syntax for a stretch of contiguous chars.

Sure, when a string literal is used as an object, an object is created to point at that stretch of chars, but — as the treatment of strings and spans in C# string slicing and pattern-matching as well as the design of System.String itself in the BCL show — strings and spans are meant to be treated as more or less interchangeable, including in pattern-matching.

I do understand that the usage of op_Implicit, etc., would probably not be how it would have been done if the idea of spans had existed in .NET (and thus F#) from day one, and that it doesn't fit perfectly in the F# tradition, but it seems clear that testing equality between a span of chars and a string is meant to be a meaningful operation from the .NET perspective.

So in my mind the change here doesn't imply some new type equivalence between spans and strings, but rather specifically enables treating string literals as patterns representing spans of characters against which, well Spans or ReadOnlySpans of characters can be matched.