r-lib/rex

When a character and regex are concatenated with `c()`, the `regex` objects are coerced to character.

Opened this issue · 3 comments

# the end dollar is improperly escaped here
rex('x' %or% c('.', end)) # (?:x|\.\$)

#using `list` prevents this
rex('x' %or% list('.', end))  # (?:x|\.$)

I don't think we want to do anything to prevent this, but we may want to make a warning when regexs get coerced to character.

Can we implement as.character.regex to do the right thing here? (Does c call a corresponding as.x.y method on coercion?)

So playing around with this a little bit, c is a generic function and you can create a method for regex objects

c.regex <- function(..., recursive = FALSE) {
  x <- structure(unlist(lapply(list(...), escape)), class = "Regex")
  p(x)
}

Which does what we want(not escaping $) if end is the first object in the c call

rex('x' %or% c(rex::shortcuts$end, '.'))
# (?:x|$\.)

But if a character is the first arg c converts all the arguments to character rather than regex, so this doesn't work.

rex('x' %or% c('.', rex::shortcuts$end))
# (?:x|\.\$)

In order to get this to actually work you would need a way to force c to always coerce to regex if any of the arguments are regex, which doesn't seem possible from what I can tell.

Also to address your question

(Does c call a corresponding as.x.y method on coercion?

It does not seem to

as.character.regex <- function(x) { stop() }
rex('x' %or% c('.', end))
# (?:x|\.\$)
as.character(shortcuts$end)
# Error in as.character.regex(shortcuts$end) :