mvdan/gogrep

make matcher optionally more agressive

mvdan opened this issue · 11 comments

mvdan commented

Right now, if one uses "ab" in an expression to look for, "a"+"b" won't be matched. We could use go/const for this.

Is there a reason why a user would ever not want this? If so, we could make it optional via the syntax.

mvdan commented

Actually, I'm thinking of making this even more generic. That is, adding two modes - a strict one and an agressive one.

The strict mode would match the original syntax exactly. "foo" wouldn't match `foo`, nor would it match "fo"+"o". And for range x { y } wouldn't match for _ = range x { y }. This can be useful when you're rewriting the syntax of a package only, without modifying the program itself. Or when you do care about little differences in syntax like these.

And the agressive mode would do just the opposite - it would ignore syntax differences that don't change the static meaning of the program. This is a rabbit hole, so I suggest to keep it limited to constant evaluation and the use of redundant _ identifiers for now. Another easy one would be for var a int to match var (a int).

Other forms of aggressive folding wold be for var a int to also have matches in var a, b int and var (a int; b string).

Since the agressive mode is a rabbit hole, I think it would be best to make the strict mode the default. While it's true that requiring the strict mode isn't very common, requiring the aggressive mode isn't common either. And the aggressive one is likely to become complex and/or slow.

@rogpeppe thoughts?

Yeah, I've had the same thoughts.
The main question to my mind is whether we'd want strict mode to be a global, or whether it would be more useful to be able to apply it to individual expressions (for example with the $< modifier that's been mentioned previously). I think at this point my inclination is towards providing individual expression freedom, but very willing to be persuaded otherwise.

mvdan commented

I completely forgot about making this a syntax flag, like type restrictions. It could definitely be useful for wildcards.

However, all these modifiers have one problem - they can't be applied to non-wildcards. For example, I can't say "any number of nil", such as in return *nil, err. Or I can't say "this for statement can be matched agressively", like <for range x { y }.

I can't find any previous mention of $<. What do you mean? Can you give an example?

Oh, that's an excellent point. That means we definitely need a global flag, and that's fine.
The $< was referring to the discussion on type matching. $<(x io.Reader) could be a non-strict version of $(x io.Reader) - i.e. it would match something of type *bufio.Reader as well as io.Reader itself.
So I guess that would make $x equivalent to $<(x interface{}).

mvdan commented

Ah, $< was just for the type restrictions. Yeah, I haven't even begun to think about that syntax.

I'm struggling to come up with a scenario where the pattern has to mix strict and non-strict parts. So this should be enough reason to just go with the global flag - we can always go a different route in the future, if this isn't enough.

However, I'm still bothered by how the wildcard modifiers should really be modifiers on all kinds of nodes. That is, type restrictions and "any of" modifiers should be applicable to non-wildcards too. Although I don't know how that could be possible without complicating the syntax to a point where it would no longer resemble Go.

One possibility for a type restriction might be someExpr.$(type) - reminiscent of Go's .() syntax but for arbitrary nodes. Initially I wondered if it might feel potentially awkwardly ambiguous with someExpr.$(x int) but I think the latter could be considered invalid because a field reference doesn't have a type itself until it's combined with the thing it's referring into. I guess someExpr$.(type) might also work without that issue.

Which I guess could be applied to wildcard nodes too. Our current $(x int) could be short for $x.$(int).

I'm having trouble coming up a use case for "any of" that doesn't involve wildcards. Maybe you could elaborate?

mvdan commented

Ah, making $ mean more things than just matching nodes is a good idea. I like the type restriction one.

I don't think we should make $(x int) a shortcut to $x.$(int) - it's one character you save. Not convinced it's worth making the syntax more complicated.

I gave an example of "any of" before, return *nil, err to match return err, return nil, err, return nil, nil, err, etc. Although that one could be swapped for "any number of $x of type untyped nil".

I struggle to come up with a use case that I would have used recently. Maybe we should come back to this once we encounter it.

And of course, there's always the agressive matching, but I can't think of a common use case either.

Maybe we should come back to this once we encounter it.

+1

mvdan commented

I'm thinking of doing this with a per-expression leading token (like a modifier). So that, in the future, we may use it in only one of the expressions, instead of for all of them. Also means we don't need a flag.

Since we said strict mode as default, I'll make ~ expr mean aggressive matching of expr. For example, ~ "ab" will match "a" + "b".

mvdan commented

To clarify, when I said expression above, I meant pattern. Not any Go expression. So the modifier can only be at the top level, foo(~bar) won't be allowed.

Nice. ~ makes a great modifier because of its association with approximation and because it's not a valid Go operator.