trinker/textclean

replace_emoticon Over Corrects for Words with xp

swnydick opened this issue · 3 comments

replace_emoticon should probably ignore some xp's, such as those that follow after "e" due to this being a common format for words in English.

library(textclean)

replace_emoticon("experience")
#> [1] "e tongue sticking out erience"

Created on 2021-06-23 by the reprex package (v2.0.0)

I have tested it using textclean version 0.9.3 on R version 4.0.3 (2020-10-10) and R version 4.1.1 (2021-08-10), obtaining the same result.

library(textclean)

replace_emoticon("experience xp")
# [1] "e tongue sticking out erience tongue sticking out "

However, the code in the replace_emoticon.R script seems to work fine if I replicate it in my own function.

my_replace_emoticon <- function(x, emoticon_dt = lexicon::hash_emoticons, ...){
  
  trimws(gsub(
    "\\s+", 
    " ", 
    mgsub_regex(x, paste0('\\b\\Q', emoticon_dt[['x']], '\\E\\b'), paste0(" ", emoticon_dt[['y']], " "))
  ))
  
}

my_replace_emoticon("experience xp")
# [1] "experience tongue sticking out"

Hello. This has been corrected in the dev version (see #46). I will push to CRAN soon. In the meantime you can install the development version per the README.

Brilliant, thank you for the quick reply. 👍