slevithan/xregexp

Why do XRegExp(...) and XRegExp.tag('x')`...` gives different output as shown below (query result incorrect?)

st-clair-clarke opened this issue · 2 comments

Why does the following give two different results for the query segment of the url:

**regex1**
export const url = XRegExp.tag('x')
    `
         ^(?<protocol>  [^:/?]+ ) ://      # aka scheme
          (?<host>      [^/?]+  )          # domain name/IP
          (?<path>      [^?]*   ) \\??     # optional path
          (?<query>     .*      )          # optional query`

Result1

Map(5) {'input' => 'https://google.com/path/to/file?q=1', 'protocol' => 'https', 'host' => 'google.com', 'path' => '/path/to/file', 'query' => '?q=1'}

regex2

export const url = XRegExp(
   `
         ^(?<protocol>  [^:/?]+ ) ://      # aka scheme
          (?<host>      [^/?]+  )          # domain name/IP
          (?<path>      [^?]*   ) \\??     # optional path
          (?<query>     .*      )          # optional query`,
   'x',
)

result2

Map(5) {'input' => 'https://google.com/path/to/file?q=1', 'protocol' => 'https', 'host' => 'google.com', 'path' => '/path/to/file', 'query' => 'q=1'}

caller

import { isNil } from 'rambdax'

export const urlSegments = (urlStr: string): Map<string, string> => {
   const segments = XRegExp.exec(urlStr, url)
   const map = new Map<string, string>()

   if (!isNil(segments)) {
      console.log(input, protocol, host, path, query)
      map.set('input', input)
         .set('protocol', protocol)
         .set('host', host)
         .set('path', path)
         .set('query', query)
   }

   return map
}

console.log(urlSegments('https://google.com/path/to/file?q=1'))

Because your two regexes are different.

The pattern provided to XRegExp.tag is handled as a raw string, so backslashes don't need to be escaped as they would be in standard strings. This is the primary advantage of XRegExp.tag, although it also shares the features of XRegExp.build so e.g. any backreferences within interpolated regexes are rewritten to work within the overall pattern.

For your regex2 pattern to be equivalent, you'd need to use XRegExp(String.raw`...`, 'x'). Note the addition of String.raw.

The difference in your patterns as they exist comes from the segment \\??. If given as a raw string, the regex engine sees two backslashes followed by two question marks, whereas if given a non-raw string, the regex engine sees one (escaped) backslash followed by two question marks. To match the same thing using a non-raw string you'd need \\\\??, escaping each of the backslashes. The raw version (String.raw`\\??` ) matches a literal backslash character zero or one time, favoring zero times if possible (due to the lazy question mark quantifier ??). The non-raw version (`\\??`) matches a literal question mark character zero or one time, favoring one time (due to the greedy question mark quantifier ?).

Cheers. I actually realized it after I sent the message.

NB: Just purchase your book! Will delve into regex some more.

Regards