tidyverse
-like syntax withdata.table
speedrlang
compatibility- Includes functions that
dtplyr
is missing, including manytidyr
functions
Install the released version from CRAN with:
install.packages("tidytable")
Or install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")
tidytable
uses verb.()
syntax to replicate tidyverse
functions:
library(tidytable)
test_df <- data.table(x = 1:3, y = 4:6, z = c("a","a","b"))
test_df %>%
select.(x, y, z) %>%
filter.(x < 4, y > 1) %>%
arrange.(x, y) %>%
mutate.(double_x = x * 2,
x_plus_y = x + y)
#> # A tidytable: 3 × 5
#> x y z double_x x_plus_y
#> <int> <int> <chr> <dbl> <int>
#> 1 1 4 a 2 5
#> 2 2 5 a 4 7
#> 3 3 6 b 6 9
A full list of functions can be found here.
Group by calls are done by using the .by
argument of any function that
has “by group” functionality.
- A single column can be passed with
.by = z
- Multiple columns can be passed with
.by = c(y, z)
test_df %>%
summarize.(avg_x = mean(x),
count = n(),
.by = z)
#> # A tidytable: 2 × 3
#> z avg_x count
#> <chr> <dbl> <int>
#> 1 a 1.5 2
#> 2 b 3 1
tidytable
follows data.table
semantics where .by
must be called
each time you want a function to operate “by group”.
Below is some example tidytable
code that utilizes .by
that we’ll
then compare to its dplyr
equivalent. The goal is to grab the first
two rows of each group using slice.()
, then add a group row number
column using mutate.()
:
library(tidytable)
test_df <- data.table(x = c("a", "a", "a", "b", "b"))
test_df %>%
slice.(1:2, .by = x) %>%
mutate.(group_row_num = row_number(), .by = x)
#> # A tidytable: 4 × 2
#> x group_row_num
#> <chr> <int>
#> 1 a 1
#> 2 a 2
#> 3 b 1
#> 4 b 2
Note how .by
is called in both slice.()
and mutate.()
.
Compared to a dplyr
pipe chain that utilizes group_by()
, where each
function operates “by group” until ungroup()
is called:
library(dplyr)
test_df <- tibble(x = c("a", "a", "a", "b", "b"))
test_df %>%
group_by(x) %>%
slice(1:2) %>%
mutate(group_row_num = row_number()) %>%
ungroup()
#> # A tibble: 4 x 2
#> x group_row_num
#> <chr> <int>
#> 1 a 1
#> 2 a 2
#> 3 b 1
#> 4 b 2
Note that the ungroup()
call is unnecessary in tidytable
.
tidytable
allows you to select/drop columns just like you would in the
tidyverse by utilizing the tidyselect
package in the background.
Normal selection can be mixed with all tidyselect
helpers:
everything()
, starts_with()
, ends_with()
, any_of()
, where()
,
etc.
test_df <- data.table(
a = 1:3,
b1 = 4:6,
b2 = 7:9,
c = c("a","a","b")
)
test_df %>%
select.(a, starts_with("b"))
#> # A tidytable: 3 × 3
#> a b1 b2
#> <int> <int> <int>
#> 1 1 4 7
#> 2 2 5 8
#> 3 3 6 9
To drop columns use a -
sign:
test_df %>%
select.(-a, -starts_with("b"))
#> # A tidytable: 3 × 1
#> c
#> <chr>
#> 1 a
#> 2 a
#> 3 b
These same ideas can be used whenever selecting columns in tidytable
functions - for example when using count.()
, drop_na.()
,
across.()
, pivot_longer.()
, etc.
A full overview of selection options can be found here.
tidyselect
helpers also work when using .by
:
test_df <- data.table(
a = 1:3,
b = 4:6,
c = c("a","a","b"),
d = c("a","a","b")
)
test_df %>%
summarize.(avg_b = mean(b), .by = where(is.character))
#> # A tidytable: 2 × 3
#> c d avg_b
#> <chr> <chr> <dbl>
#> 1 a a 4.5
#> 2 b b 6
rlang
can be used to write custom functions with tidytable
functions. The embracing shortcut {{ }}
works, or you can use
enquo()
with !!
if you prefer.
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
add_one <- function(data, add_col) {
data %>%
mutate.(new_col = {{ add_col }} + 1)
}
df %>%
add_one(x)
#> # A tidytable: 3 × 4
#> x y z new_col
#> <dbl> <dbl> <chr> <dbl>
#> 1 1 1 a 2
#> 2 1 1 a 2
#> 3 1 1 b 2
The dt()
function makes regular data.table
syntax pipeable, so you
can easily mix tidytable
syntax with data.table
syntax:
df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))
df %>%
dt(, .(x, y, z)) %>%
dt(x < 4 & y > 1) %>%
dt(order(x, y)) %>%
dt(, double_x := x * 2) %>%
dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#> z avg_x
#> <chr> <dbl>
#> 1 a 1.5
#> 2 b 3
If you want to use a dplyr
function that hasn’t yet been implemented
in tidytable
you can. For example - dplyr::add_count()
:
library(tidytable)
library(dplyr)
test_df <- tidytable(x = 1:3, y = c("a", "a", "b"))
test_df %>%
mutate.(double_x = x * 2) %>%
add_count()
#> # A tidytable: 3 × 4
#> x y double_x n
#> <int> <chr> <dbl> <int>
#> 1 1 a 2 3
#> 2 2 a 4 3
#> 3 3 b 6 3
If you want to use data.table
you can - however it is recommended to
first convert the object to a data.table if you are using any of
data.table’s “set” operations to prevent issues with data.table’s
modify-by-reference.
library(tidytable)
library(data.table)
test_df <- tidytable(x = 3:1, y = c("c", "b", "a"))
new_df <- test_df %>%
mutate.(double_x = x * 2)
new_df <- as.data.table(new_df)
setorder(new_df, y)[]
#> x y double_x
#> 1: 1 a 2
#> 2: 2 b 4
#> 3: 3 c 6
For those interested in performance, speed comparisons can be found here.