magnars/s.el

s-count-matches doesn't count overlapping matches

Closed this issue · 5 comments

zck commented
ELISP> (s-count-matches "HH" "HHH")
1 (#o1, #x1, ?\C-a)

This returns 1. I think it should be 2.

There's seemingly a mismatch with s-match-strings-all. I would expect (length (s-match-strings-all some-regex some-list)) to return the same thing as (s-count-matches some-regex some-list). But it doesn't:

ELISP> (s-match-strings-all "HH" "HHH")
(("HH")
 ("HH"))

I suspect this is because s-count-matches delegates to the built-in count-matches, which doesn't count overlapping regex matches. But, at least to me, this is unintuitive behavior that isn't obvious from the documentation. Is there a function which returns the same as (length (s-match-strings-all some-regex some-list))? I can't find one.

Mismatch or no, this isn't something we're going to change. Adding a function s-count-matches-all might be interesting. In the meantime I would suggest just using (length (s-match-strings-all some-regex some-list)).

zck commented

I can add a function s-count-matches-all. Mind if I also update the docstring for s-count-matches to explicitly say it doesn't count overlapping matches?

zck commented

Coding this up, I was planning on matching the signature of s-count-matches (i.e., (regexp s &optional start end)), but I realized --after writing code and tests -- that unlike many functions in s.el, s-count-matches is 1-based. I expected it to be 0-based.

I would've expected the following to return 0 (if it was 0-based), but it returns 1 instead:

ELISP> (s-count-matches "a" "ab" 1)
1 (#o1, #x1, ?\C-a)

This seems to be the intent, but some fundamental functions in s.el are 0-based:

ELISP> (s-matches? "a" "ab" 1)
nil
ELISP> (s-match "a" "ab" 1)
nil

As well as string-related functions in Emacs:

ELISP> (substring "ab" 1)
"b"

Would you think this, too is worth calling out in the docstrings of s-count-matches and s-count-matches-all?

Ugh, that's no good. Yes, we definitely want to call this out in the docstring.