pointfreeco/swift-parsing

How to perform case insensitive matching?

finestructure opened this issue ยท 7 comments

I think I know how to solve this in principle by writing a whole new lower level parser (which will take me considerable time to wrap my head around ๐Ÿ˜…) but I'm hoping there's an easier way I don't see. What I'm trying to do get the last test case to pass:

    func test_repo() throws {
        XCTAssertEqual(try Github.repo.parse("foo.git"), "foo")
        XCTAssertEqual(try Github.repo.parse("foo.swift.git"), "foo.swift")
        XCTAssertEqual(try Github.repo.parse("Foo.git"), "Foo")
        // This needs fixing
        //        XCTAssertEqual(try Github.repo.parse("foo.GIT"), "foo")
    }

with an updated parser:

enum Github {
    static let repo = Parse {
        OneOf {
            Parse {
                PrefixUpTo(".git")
                Skip { ".git" }
            }
            Rest()
        }
    }
}

Of course, I could stack a

            Parse {
                PrefixUpTo(".GIT")
                Skip { ".GIT" }
            }

in there.

That is, until Git comes along ๐Ÿคช. There must be a better way?

You can configure your PrefixUpTo with what you mean by being equivalent chars. You can use it to compare lowercased versions for example.

enum Github {
  static let repo = Parse {
    OneOf {
      Parse {
        PrefixUpTo<Substring>(".git") { $0.lowercased() == $1.lowercased() }
        StartsWith<Substring>(".git") { $0.lowercased() == $1.lowercased() }
      }
      Rest()
    }
  }
}

Because you want to discard what's coming next, you should also use the same trick for StartsWith (but you can't use the literal anymore, and you lose the inference it provided, so you need to specify generics for your parser). Also your Skip was useless as you were already parsing ".git" into Void. Skip { ".git" } and ".git" (aka StartsWith(".git")) produce the same results, they succeed with Void if the string starts with ".git".

Fortunately, the input is likely not in German, so in this case, it should be enough to compare chars by chars using .lowercase().

Oh interesting, I had no idea Prefix had that closure syntax. Thanks a lot, Thomas! And yes, that Skip was totally redundant :)

Does your snippet compile for you? I'm getting the following error and I can't quite figure out how to provide the Input generic. I'll need to dig out the video, I know I've seen it somewhere...

CleanShot 2022-06-27 at 19 54 08@2x

Ah, I see you noticed. It's working now :)

Indeed, I edited while you were responding!

Closing this then - merci beaucoup Thomas!

Almost anything that compare things in the library has a parameter where you can specify what you mean by being equal. The trailing closure syntax is a little unfortunate in this case, because you're actually parsing PrefixUpto(".git", by: { $0.lowercased() == $1.lowercased() }) which is clearer.

Gonna convert this to a discussion so others may find it ๐Ÿ˜„