comby-tools/comby

Are matches supposed to happen on partial tokens?

charles-gray opened this issue · 4 comments

Describe the bug

Comby matches "token(:[foo])" against things like "longer_token(...)". ie. the token in the input string is treated as a suffix, rather than a full token.

Reproducing

bit.ly/3AhdQFn

Expected behavior

I would expect that "foo(:[hole])" to not match on "prefix_foo(bar)", otherwise, a whole bunch of the examples on the website break.

Additional context

You can break examples on the website with tokens like "foo_if" that match on plain old "if". You can try and work around this by putting a space before your token in the match string, but then you can't match things like "bar(foo(:[hole]))" in your source because there's no space.

I assume there's some convoluted regex I can put before my token to force separation, but given that it's required to include special characters that affect balancing, I can't find one that works just yet...

Currently this is intended behavior. Two options:

  • Try adding -disable-substring-matching on the command line, does that work?
echo "serif(font); if(font); ser.if(font)" | comby -stdin 'if(font)' 'foo(font)' -disable-substring-matching
  • Alternatively, there is a simple regex that may work well enough using \b for word boundaries like :[~\bif](foo) https://bit.ly/3wrshWd

Ah! \b looks like exactly the regex magic I was looking for, but didn't know where to start. I've done a couple of spot checks and it seems to give results I'm expecting. I'll confirm will the entire source base soon.

I'm using the Python API so the command-line argument doesn't seem to help me here. It seems to be a missing option in the API bindings, but I guess I'd go file that on that project.

Either way, assuming I confirm the regex tweak works, that fits perfectly into what I'm doing. Thanks for the prompt reply!

OK, so after a much deeper dive, '\b' seems to be the ticket. The -disable-substring-matching doesn't work for the general case because it seems to still miss the case of an "_" prefix, eg, the last addition here:

echo "serif(font); if(font); ser.if(font); ser_if(font)" | comby -stdin 'if(font)' 'foo(font)' -disable-substring-matching

Thanks so much for the help again. I'll close this issue out. I'd love to see something in the docs / FAQ about this, but no longer necessary for me. Thanks!!

cool cool. I'm (slowly) revamping some of the docs page, and will try compile a "common questions and answers" that this might fit into. cheers!