Question; I cannot get a group to be captured
st-clair-clarke opened this issue · 3 comments
Hi
I am attempting to understand the Reformat Names with Particle
The regex copied from the book is
^(.+?)●((?:(?:d[eu]|l[ae]|Ste?\.?|v[ao]n)●)*[^\s,]+)↵ (,?●(?:[JS]r\.?|III?|IV))?$
My corresponding conversion to XRegExp attempt is
export const reformatNameWithParticleRegex = (): RegExp => XRegExp.tag('nxi')`
^ (?<firstAndMiddle> .+?)
(?<last>
(?<particle> (d[eu]|l[ae]|Ste?\.?|v[ao]n ))
*[^\s,]+
)
(?<suffix> ,? (?:[JS]r\.?|III?|IV))?$`
My usage is:
export const reformatNameWithParticle = (subject: string) => {
const nameMatchParticle = XRegExp.exec( subject, reformatNameWithParticleRegex() )
if( isNil( nameMatchParticle) ){
return `No match found for entry ${subject}`
}
console.log({nameMatchParticle})
const [input, firstAndMiddle, last, particle, suffix] = nameMatchParticle
const formatted = `${particle} ${last ?? ''}, ${firstAndMiddle} ${suffix ?? ''}`
return XRegExp.replace(subject, reformatNameWithParticleRegex(), trim(formatted))
}
const namePart = reformatNameWithParticle('Charles de Gaulle')
const namePart1 = reformatNameWithParticle('John F Kennedy')
console.log({namePart, namePart1})
Observe my console result:
Note the undefined array items at indices 3, and 4 are undefined.
Thanks for your help.
A few issues:
- You miscopied the regex in Regular Expressions Cookbook to show a space character after the formatting-only-line-break (↵) that's used when wrapping lines. There is no space in that position.
- Your free-spacing reconstruction with the
xflag does not preserve any of the three meaningful space characters (●) from the book. In the free-spacing example in that recipe, it shows that when you rewrite the regex withxyou have to escape the spaces e.g. using\(backslash space). - Since you're using flag
nin your reconstruction, you can replace the remaining(?:with(with no change in meaning. - Your addition of a named
particlegroup doesn't really work because that grouping is repeated with*, so the backreference will contain only the last particle if there are multiple (e.g. in "Maria de la Luz"). As constructed, the regex can only reliably give you backreferences for (1) first and middle names or initials, (2) particles and last name, and (3) suffix with a preceding space and optional preceding comma.
Here's a working version:
const nameParts = XRegExp.tag('gimnx')`
^
(?<given_names> .+? )
\
(?<particles_and_last>
(
( d[eu] | l[ae] | Ste?\.? | v[ao]n )
\
)*
[^\s,]+
)
(?<suffix> ,? \ ( [JS]r\.? | III? | IV ) )?
$`;I added flags g and m so it can operate on line-separated names and I could show the following output:
let examples = `Martin Luther King, Jr
John Smith III
John F. Kennedy
Scarlett O’Hara
Chloë Grace Moretz
Leonardo Ángel Charles Baldwin
Pepé Le Pew
J.R.R. Tolkien
Catherine Zeta-Jones
Maria de la Luz
Charles de Gaulle
Александар Вучић
Beyoncè`;
XRegExp.replace(examples, nameParts, '$<particles_and_last>, $<given_names>$<suffix>');
/* ->
King, Martin Luther, Jr
Smith, John III
Kennedy, John F.
O’Hara, Scarlett
Moretz, Chloë Grace
Baldwin, Leonardo Ángel Charles
Le Pew, Pepé
Tolkien, J.R.R.
Zeta-Jones, Catherine
de la Luz, Maria
de Gaulle, Charles
Вучић, Александар
Beyoncè
*/A big thank you.
After copying your example, it did not work for me simply because my IDE optimised the '\ ' on save => hence null was return. However, after replacing the '\ ' with '\x20' it worked.
\x20 is a good solution, and probably what I would use. Another option is [ ] since XRegExp treats whitespace inside character classes as meaningful even when using flag x.
