Author fails to extract if text contains `:` and two or more linebreaks
andreblanke opened this issue · 2 comments
The author of a message seems to be incorrectly reported as NA
if the message text contains both a :
and two or more linebreaks.
The following should be a minimum reproducible example:
chat0.txt
08.02.20, 17:35 - First Last: The time is 17:35.
2nd line.
3rd line.
chat1.txt
08.02.20, 17:35 - First Last: The time is 17:35.
2nd line.
test.Rmd
---
output: html_notebook
---
```{r}
library("rwhatsapp")
chat0 <- rwa_read("chat0.txt")
chat0
chat1 <- rwa_read("chat1.txt")
chat1
```
It reports NA
as author of the message in chat0.txt
and First Last
as author of the message in chat1.txt
.
I don't know if this is related to #14, as I didn't quite understand what that issue is about. Excuse me if it is a duplicate.
Wow, thanks for reporting this. I even had problems coming up with a test to reproduce it since this only seems to happen when the first message contains a time plus several lines (so thanks for doing the hard work of narrowing it down to the reprex you posted). It should work now:
rwhatsapp::rwa_read(x = c("08.02.20, 17:35 - First Last: The time is 17:36.",
"2nd line.",
"3rd line.",
"08.02.20, 17:35 - First Last: The time is 17:36.",
"2nd line."))
#> # A tibble: 2 x 6
#> time author text source emoji emoji_name
#> <dttm> <fct> <chr> <chr> <lis> <list>
#> 1 2020-02-08 17:35:26 First L~ "The time is 17:36.\n2~ text in~ <NUL~ <NULL>
#> 2 2020-02-08 17:35:26 First L~ "The time is 17:36.\n2~ text in~ <NUL~ <NULL>
Created on 2020-02-08 by the reprex package (v0.3.0)
Thanks a lot for the quick fix. I thought all other issues in my data set would also stem from this misbehavior, however, it seems there's more situations in which the existing regex is a bit sensitive but I'll file a different issue for those.