Bug in fgsub
trinker opened this issue · 0 comments
trinker commented
The algorith tries to be clever and fast but leads to a repalcement that is not desired. The original pattern
in fgsub
matches the location in the string but when the replacement occurs this is done on the entire string rather than the location of the first pattern match.
x <- c("00:04", "00:08", "00:01", "06:14", "00:02", "00:04", "00:08",
"00:01", "06:14", "00:02")
mgsub <- textclean::mgsub
#fgsub <- function (x, pattern, fun, ...) {
locs <- stringi::stri_detect_regex(x, pattern)
locs[is.na(locs)] <- FALSE
txt <- x[locs]
hits <- stringi::stri_extract_all_regex(txt, pattern)
pats <- unique(unlist(hits))
reps <- paste0("textcleanholder", seq_along(pats), "textcleanholder")
freps <- unlist(lapply(pats, fun))
txt <- mgsub(txt, pats, reps)
x[locs] <- mgsub(txt, reps, freps)
x
#}
## output
## [1] " : 4" " : 8" " : 1" " 6:14" " : 2" " : 4" " : 8" " : 1" " 6:14" " : 2"
## desired output
## [1] " :04" " :08" " :01" " 6:14" " :02" " :04" " :08" " :01" " 6:14" " :02"