JBGruber/rwhatsapp

Author fails to extract if text contains `:` and two or more linebreaks

andreblanke opened this issue · 2 comments

The author of a message seems to be incorrectly reported as NA if the message text contains both a : and two or more linebreaks.

The following should be a minimum reproducible example:

example.zip

chat0.txt
08.02.20, 17:35 - First Last: The time is 17:35.
2nd line.
3rd line.
chat1.txt
08.02.20, 17:35 - First Last: The time is 17:35.
2nd line.
test.Rmd
---
output: html_notebook
---

```{r}
library("rwhatsapp")
chat0 <- rwa_read("chat0.txt")
chat0
chat1 <- rwa_read("chat1.txt")
chat1
```

It reports NA as author of the message in chat0.txt and First Last as author of the message in chat1.txt.

I don't know if this is related to #14, as I didn't quite understand what that issue is about. Excuse me if it is a duplicate.

Wow, thanks for reporting this. I even had problems coming up with a test to reproduce it since this only seems to happen when the first message contains a time plus several lines (so thanks for doing the hard work of narrowing it down to the reprex you posted). It should work now:

rwhatsapp::rwa_read(x = c("08.02.20, 17:35 - First Last: The time is 17:36.",
                          "2nd line.",
                          "3rd line.",
                          "08.02.20, 17:35 - First Last: The time is 17:36.",
                          "2nd line."))
#> # A tibble: 2 x 6
#>   time                author   text                    source   emoji emoji_name
#>   <dttm>              <fct>    <chr>                   <chr>    <lis> <list>    
#> 1 2020-02-08 17:35:26 First L~ "The time is 17:36.\n2~ text in~ <NUL~ <NULL>    
#> 2 2020-02-08 17:35:26 First L~ "The time is 17:36.\n2~ text in~ <NUL~ <NULL>

Created on 2020-02-08 by the reprex package (v0.3.0)

Thanks a lot for the quick fix. I thought all other issues in my data set would also stem from this misbehavior, however, it seems there's more situations in which the existing regex is a bit sensitive but I'll file a different issue for those.