replace_emoticon Over Corrects for Words with xp
swnydick opened this issue · 3 comments
swnydick commented
replace_emoticon should probably ignore some xp's, such as those that follow after "e" due to this being a common format for words in English.
library(textclean)
replace_emoticon("experience")
#> [1] "e tongue sticking out erience"
Created on 2021-06-23 by the reprex package (v2.0.0)
sdesabbata commented
I have tested it using textclean
version 0.9.3 on R version 4.0.3 (2020-10-10) and R version 4.1.1 (2021-08-10), obtaining the same result.
library(textclean)
replace_emoticon("experience xp")
# [1] "e tongue sticking out erience tongue sticking out "
However, the code in the replace_emoticon.R
script seems to work fine if I replicate it in my own function.
my_replace_emoticon <- function(x, emoticon_dt = lexicon::hash_emoticons, ...){
trimws(gsub(
"\\s+",
" ",
mgsub_regex(x, paste0('\\b\\Q', emoticon_dt[['x']], '\\E\\b'), paste0(" ", emoticon_dt[['y']], " "))
))
}
my_replace_emoticon("experience xp")
# [1] "experience tongue sticking out"
trinker commented
Hello. This has been corrected in the dev version (see #46). I will push to CRAN soon. In the meantime you can install the development version per the README.
sdesabbata commented
Brilliant, thank you for the quick reply. 👍