moodymudskipper/reactibble

support for base and dplyr joins

brownag opened this issue ยท 4 comments

Hi @moodymudskipper

Cool package! I have been intrigued by subclasses of tbl/data.frame, so I had to try this one out.

Here is a quick little reprex of something I noticed. I could see why this might be intended behavior -- but also bumped into it pretty quickly when dropping reactibble into a typical workflow of mine.

I would be curious to hear what you think!

library(reactibble)

rt1 <- reactibble(a = 1:10, b = ~ rev(a))
rt2 <- reactibble(a = 2:5, c = letters[1:4])

res <- dplyr::left_join(rt1, rt2)
#> Joining, by = "a"

# still reactibble
class(res)
#> [1] "reactibble" "tbl_df"     "tbl"        "data.frame"

# a/b reversed; b is reactive
rt1$a <- rev(rt1$a)
rt1
#> # A reactibble: 10 x 2
#>        a      b
#>    <int> <~int>
#>  1    10      1
#>  2     9      2
#>  3     8      3
#>  4     7      4
#>  5     6      5
#>  6     5      6
#>  7     4      7
#>  8     3      8
#>  9     2      9
#> 10     1     10

# a/b not reversed; b is not reactive
res$a <- rev(res$a)
res
#> # A reactibble: 10 x 3
#>        a     b c    
#>    <int> <int> <chr>
#>  1    10    10 <NA> 
#>  2     9     9 a    
#>  3     8     8 b    
#>  4     7     7 c    
#>  5     6     6 d    
#>  6     5     5 <NA> 
#>  7     4     4 <NA> 
#>  8     3     3 <NA> 
#>  9     2     2 <NA> 
#> 10     1     1 <NA>

Created on 2021-01-07 by the reprex package (v0.3.0)

Thanks a lot for your report.

It came from the fact that dplyr joins call as_tibble, and I had defined as_tibble to convert all columns to static. I removed the as.data.frame, as_tibble, and as.data.table methods for now and it works as expected :

library(reactibble)

rt1 <- reactibble(a = 1:10, b = ~ rev(a))
rt2 <- reactibble(a = 2:5, c = letters[1:4])

res <- dplyr::left_join(rt1, rt2)
#> Joining, by = "a"
rt1$a <- rev(rt1$a)
rt1
#> # A reactibble: 10 x 2
#>        a      b
#>    <int> <~int>
#>  1    10      1
#>  2     9      2
#>  3     8      3
#>  4     7      4
#>  5     6      5
#>  6     5      6
#>  7     4      7
#>  8     3      8
#>  9     2      9
#> 10     1     10
res$a <- rev(res$a)
res
#> # A reactibble: 10 x 3
#>        a      b c    
#>    <int> <~int> <chr>
#>  1    10      1 <NA> 
#>  2     9      2 a    
#>  3     8      3 b    
#>  4     7      4 c    
#>  5     6      5 d    
#>  6     5      6 <NA> 
#>  7     4      7 <NA> 
#>  8     3      8 <NA> 
#>  9     2      9 <NA> 
#> 10     1     10 <NA> 

Additional thoughts :

base::merge makes all columns static though.

Ultimately we should design methods for merge and dplyr joins, and then we might try to reimplement the as.data.frame and as_tibble methods.

merge and dplyr join will run their data.frame method if the reactibble object is not provided first, in that case we cannot do much, it should return a non reactibble object with static columns (right now left_join(as.data.frame(r1), rt2) will return a data frame with reactive columns, which we want to avoid).

as.data.frame.reactibble can detect that it has been called by merge.data.frame and trigger a warning.

Our methods should enforce that one cannot join by a reactive column.

Great, thanks for explanation and fix, makes sense.

For what it is worth I frequently use base::merge over dplyr::left_join -- and am familiar with the revert-to-data.frame that happens when merge is called on tibbles -- so would be supportive of methods designed explicitly to handle persistence of reactivity.

I have implemented methods for dplyr joins, they do the same but make sure we don't join by reactive columns.

I decided for now to keep merge consistent with what happens to tibbles, so it creates a data frames with static columns, not what you hoped for @brownag (sorry!) but I believe it is for the best.

Also my current view (see last comment in #5 ) is that tibbles can be out of sync on purpose, while data frames shouldn't, and for consistency I would need to define merge.tbl_df, which would be bad practice and rude to tidyverse maintainers.

This makes sense. I figured you would wind up going this way after doing some reading and thinking on it.

I think they have explicitly said that merge.tbl_df could not be done properly without multiple dispatch? I remember reading issue on tibble.

Great work and I appreciate you writing out your thought process.