comby-tools/comby

Modify characters in the hole or modify variable names based on patterns

Opened this issue · 2 comments

It would be nice to modify the names captured in the holes. For example, i was trying to change the pattern of some C++ member variables, from variable_name_ to m_variable_name.

The pattern of the variable name can be matched with ":[~\w+_\b]". During rewrite, there doesn't seem to be anyway to perform any operations on the name.

Describe the solution you'd like
One solution might involve capture groups in the regex. The simplest would be to use the first capture group (if present) as the hole. So ":[var~(\w+)_\b]" for the pattern would result in ":[var]" being "variable_name" without the underscore, and then "m_:[var]" for the rewrite would complete the variable name change.

However, this simple solution gives no way for multiple capture groups or named capture groups.

This type of variable renaming might be better expressed in the rules rewrite section? (It seems that regex's are not allowed in the rules rewrite section currently)

Hey @markdewing this is a good use case to support, yeah. It'd make sense to support multiple capture groups. I also think it makes sense to support regex capture groups natively (i.e., (...) is bound to some syntax $X). This mainly because a regex pattern for a hole takes precedence over matching things between holes, so there will exist capture groups when using regular expressions that are awkward/difficult to match without (I think your use case kind of hints at this).

One idea is to just reference capture groups as a kind of selector on variables. For example, you could select the group matched by :[var~(\w+)_\b] using :[var].$1.

This type of variable renaming might be better expressed in the rules rewrite section?

I think with the above sort of convention, it can inline.

It seems that regex's are not allowed in the rules rewrite section currently

They are (at least, to match and bind), but I don't think it solves your problem. The reason why it's maybe not obvious is, because patterns are enclosed in quotes, characters like \ need to be escaped:

example with rewrite :[x] { ":[~\\d+]" -> "digits" }

So this is a bit obnoxious--I'll be changing up the rewrite parts at some point to not strictly require quotes to help with this sort of thing.

Some pattern languages have named groups: https://www.regular-expressions.info/refext.html

What about:

Select the group matched by :[var~(?\w+)_\b] and use it with :[var.aword].