callstack/ts-regex-builder

[Bug]test-utils/ts-match-string doesn't work with trailing lookaheads.

PaulJPhilp opened this issue · 14 comments

Describe the bug
A regular expression that ends with a lookahead not supported by ts-match-string.

To Reproduce
Steps to reproduce the behaviour:

Building a regex to match the Scheme ("ftp:") of a URL.
Use a positive lookahead to match the trailing ':'.

  1. Create a regex that has a trailing positive lookahead:
    const urlScheme = buildRegExp(
    [choiceOf(schemePlural, schemeSingular), lookahead(':')],
    );
  2. Create tests using the character from the positive lookahead as the final character.
    1) expect(urlSchemeValidator).toMatchString('ftp:');
    2) expect("ftp:").toMatch(urlSchemeValidator);
    3) expect('ftp').toMatch(urlSchemeValidator); // for completeness
    4) expect(urlSchemeValidator).toMatchString('ftp'); // for completeness

Expected behaviour
Test 1) and 2) should pass and 3) and 4) should not pass.
Instead, all 4 tests fail.

Screenshots
image

Package version
ts-regex-builder:

Additional context

I have a solution to propose in an upcoming PR.

@PaulJPhilp thank you for reporting it. Does it also occurs when you code regex literal by hand? Is it library issues or JS regex limitation?

No. It is just using the library. I understand the issue better now.

The problem is in the testing utility: ToMatchString(). I came to think the issue is missing functionality:

I created a a regex builder (urlSheme) to recognize a URL scheme e.g. (http:). I used a positive lookahead to match the ':'.
Now I want to test it. I can:

  1. expect(urlScheme).toMatchString("http") - the result is false because 'http' does have the ':' character to match the lookahead.
  2. expect(urlScheme).toMatchString("http:") - the result is false because because there is not character after the ':'. It's a subtle issue. In the end, I had to manually walk through the state machine to understand what is happening. The positive lookahead matches the ':' but does not consume it. So there is one more character left but the pattern is exhausted so the match fails.

Conceptually, you need a toMatchSubstring('http', 'http:') which matches because 'http' is a substring of 'http:'.

I tried a few APIs and settled on:
`interface MatchTypeOptions {
exactString: boolean;
substring?: string;
}

export function toMatchString(
this: jest.MatcherContext,
received: RegExp | RegexSequence,
expected: string,
matchType?: MatchTypeOptions,
)` in my PR.

I changed a few existing tests (currency, filename, suffixes) to use the new API. I think that those test were getting the correct result for the wrong reason. Here's how it looks:

 expect(currencyRegex).toMatchString('$10', { exactString: false, substring: '10' });
 expect(filenameRegex).toMatchString('index.ts', { exactString: false });
 expect(regex).toMatchString('democracy', { exactString: false });

This is the biggest reason my PR got so large. Take a look.

I'm not married to the API, so feel free to suggest an alternative.

As far as I understand the issue is in that you use a lookahead + immediate end of string, but without any other pattern that would consume the ":" character.

This should be easy to fix by just swapping lookahead(":") with ":". Otherwise you pattern requires that both happen an the same time:

  • there is a : char right after ftp
  • there is end of string right after ftp

By using plain ":", you reconcile these two by matching the colon as well as consuming it before matching end of string.

toMatchString is just an internal testing helper, it's not part of our public API. It's basically a wrapper around String.match method.

Could you verify the issues you are observing using plain RegExp (String) methods like RegExp.test, RegExp.exec or String.match?

Not sure how relevant is it, but your first example used startOfString endOfString to wrap the pattern, but your last example does not use it (nor ^ & $).

Please do so, it will be easier to focus that issue, as for now I don't feel like I understand it exactly.

Let's gather the findings so far:

  • the pattern /[a-z]{2,6}(?=:)/ will match "http:"
  • but pattern /^[a-z]{2,6}(?=:)$/ will not match "http:"
  • toMatchString() test utility is a simple wrapper over String.match, and the behavior you observed with expect(/^[a-z]{2,6}(?=:)$/)toMatchString("http:") would be also observed with "http:".match(/^[a-z]{2,6}(?=:)$/)
  • lookaheads (?=...) do not consume regex characters. As the name suggest they "look ahead" without moving the current matching position
  • using $ (end of string) would only work if the : got consumed, which we can get by replacing lookahead with regular character match /^[a-z]{2,6}:$/

As far as I understand your issue, and I am not sure whether I understand it correctly, you would like to have regex with both lookahead for : character (for nesting regex expressions) and be able to use end of string anchor (for validation).

If so, the I think this is not achievable using one regex pattern. There would have to be two:

  • validation (consuming :, end of string assertion) /^[a-z]{2,6}:$/
  • nesting (not consuming :, no end of string assertion) /[a-z]{2,6}(?=:)/

Let me know if I've got that right or did I miss something.