trinker/textclean

Bug in fgsub

trinker opened this issue · 0 comments

The algorith tries to be clever and fast but leads to a repalcement that is not desired. The original pattern in fgsub matches the location in the string but when the replacement occurs this is done on the entire string rather than the location of the first pattern match.

x <- c("00:04", "00:08", "00:01", "06:14", "00:02", "00:04", "00:08", 
"00:01", "06:14", "00:02")
mgsub <- textclean::mgsub

#fgsub <- function (x, pattern, fun, ...) {

    locs <- stringi::stri_detect_regex(x, pattern)
    locs[is.na(locs)] <- FALSE
    txt <- x[locs]
    hits <- stringi::stri_extract_all_regex(txt, pattern)
    pats <- unique(unlist(hits))
    reps <- paste0("textcleanholder", seq_along(pats), "textcleanholder")
    freps <- unlist(lapply(pats, fun))
    txt <- mgsub(txt, pats, reps)
    x[locs] <- mgsub(txt, reps, freps)
    x

#}
    
## output    
## [1] "  : 4" "  : 8" "  : 1" " 6:14" "  : 2" "  : 4" "  : 8" "  : 1" " 6:14" "  : 2"

## desired output    
## [1] "  :04" "  :08" "  :01" " 6:14" "  :02" "  :04" "  :08" "  :01" " 6:14" "  :02"