slevithan/xregexp

Question; I cannot get a group to be captured

st-clair-clarke opened this issue · 3 comments

Hi
I am attempting to understand the Reformat Names with Particle

The regex copied from the book is

^(.+?)●((?:(?:d[eu]|l[ae]|Ste?\.?|v[ao]n)●)*[^\s,]+)↵ (,?●(?:[JS]r\.?|III?|IV))?$
My corresponding conversion to XRegExp attempt is

export const reformatNameWithParticleRegex = (): RegExp => XRegExp.tag('nxi')`
 ^  (?<firstAndMiddle> .+?)
    (?<last>
      (?<particle> (d[eu]|l[ae]|Ste?\.?|v[ao]n ))
      *[^\s,]+
    )
    (?<suffix> ,? (?:[JS]r\.?|III?|IV))?$`

My usage is:

export const reformatNameWithParticle = (subject: string) => {
  const nameMatchParticle = XRegExp.exec( subject, reformatNameWithParticleRegex() )
  
  if( isNil( nameMatchParticle) ){
    return `No match found for entry ${subject}`
  }
  
  console.log({nameMatchParticle})
  
  const [input, firstAndMiddle, last, particle, suffix] = nameMatchParticle
  const formatted = `${particle} ${last ?? ''}, ${firstAndMiddle} ${suffix ?? ''}`
  return XRegExp.replace(subject, reformatNameWithParticleRegex(), trim(formatted))
}


    const namePart = reformatNameWithParticle('Charles de Gaulle')
   const namePart1 = reformatNameWithParticle('John F Kennedy')
  console.log({namePart, namePart1})

Observe my console result:

image

Note the undefined array items at indices 3, and 4 are undefined.

Thanks for your help.

A few issues:

  • You miscopied the regex in Regular Expressions Cookbook to show a space character after the formatting-only-line-break (↵) that's used when wrapping lines. There is no space in that position.
  • Your free-spacing reconstruction with the x flag does not preserve any of the three meaningful space characters (●) from the book. In the free-spacing example in that recipe, it shows that when you rewrite the regex with x you have to escape the spaces e.g. using \ (backslash space).
  • Since you're using flag n in your reconstruction, you can replace the remaining (?: with ( with no change in meaning.
  • Your addition of a named particle group doesn't really work because that grouping is repeated with *, so the backreference will contain only the last particle if there are multiple (e.g. in "Maria de la Luz"). As constructed, the regex can only reliably give you backreferences for (1) first and middle names or initials, (2) particles and last name, and (3) suffix with a preceding space and optional preceding comma.

Here's a working version:

const nameParts = XRegExp.tag('gimnx')`
  ^
  (?<given_names> .+? )
  \ 
  (?<particles_and_last>
    (
      ( d[eu] | l[ae] | Ste?\.? | v[ao]n )
      \ 
    )*
    [^\s,]+
  )
  (?<suffix> ,? \  ( [JS]r\.? | III? | IV ) )?
  $`;

I added flags g and m so it can operate on line-separated names and I could show the following output:

let examples = `Martin Luther King, Jr
John Smith III
John F. Kennedy
Scarlett O’Hara
Chloë Grace Moretz
Leonardo Ángel Charles Baldwin
Pepé Le Pew
J.R.R. Tolkien
Catherine Zeta-Jones
Maria de la Luz
Charles de Gaulle
Александар Вучић
Beyoncè`;

XRegExp.replace(examples, nameParts, '$<particles_and_last>, $<given_names>$<suffix>');
/* ->
King, Martin Luther, Jr
Smith, John III
Kennedy, John F.
O’Hara, Scarlett
Moretz, Chloë Grace
Baldwin, Leonardo Ángel Charles
Le Pew, Pepé
Tolkien, J.R.R.
Zeta-Jones, Catherine
de la Luz, Maria
de Gaulle, Charles
Вучић, Александар
Beyoncè
*/

A big thank you.

After copying your example, it did not work for me simply because my IDE optimised the '\ ' on save => hence null was return. However, after replacing the '\ ' with '\x20' it worked.

\x20 is a good solution, and probably what I would use. Another option is [ ] since XRegExp treats whitespace inside character classes as meaningful even when using flag x.