gesiscss/WhatsR

deleting the file attachments from flattened message

Closed this issue · 0 comments

Is the (.)*? part of the regex really necessary? Because it slows down the deletion of "pattern" significantly.

pattern <- "(.)*?(\\s\\(file attached\\))($|\\s)"
Flat <- readLines("test.txt") #~1million characters
#> system.time(gsub(pattern, "", Flat, perl = TRUE))
#>     user    system   elapsed 
#> 27127.242     1.162 27129.033 

Thats 8 hours, compared to instantly when (.)*? is removed.