The furniture R package contains functions to help with data
cleaning/tidying (e.g., washer()
, rowmeans()
, rowsums()
),
exploratory data analysis and reporting (e.g., table1()
, tableC()
,
tableF()
). It currently contains eight main functions:
table1()
: gives a well-formatted table for academic publication of descriptive statistics. Very useful for quick analyses as well. Notably,table1()
now works withdplyr::group_by()
.tableC()
: gives a well-formatted table of correlations.tableF()
: provides a thorough frequency table for quick checks of the levels of a variable.washer()
: changes several values in a variable (very useful for changing place holder values to missing).long()
: is a wrapper ofstats::reshape()
, takes the data from wide to long format (long is often the tidy version of the data), works well with the tidyverse, and can handle unbalanced multilevel data.wide()
: also a wrapper ofstats::reshape()
, takes the data from long to wide, and likelong()
, works well with the tidyverse and can handle unbalanced multilevel data.rowmeans()
androwmeans.n()
: tidyverse friendly versions ofrowMeans()
, where therowmeans.n()
function allowsn
number of missingrowsums()
androwsums.n()
: tidyverse friendly versions ofrowSums()
, where therowsums.n()
function allowsn
number of missing
In conjunction with many other tidy tools, the package should be useful for health, behavioral, and social scientists working on quantitative research.
The latest stable build of the package can be downloaded from CRAN via:
install.packages("furniture")
You can download the developmental version via:
remotes::install_github("tysonstanley/furniture")
The main functions are the table*()
functions (e.g., table1()
,
tableC()
, tableF()
).
library(furniture)
data("nhanes_2010")
table1(nhanes_2010,
age, marijuana, illicit, rehab,
splitby=~asthma,
na.rm = FALSE)
#>
#> ───────────────────────────────────
#> asthma
#> Yes No
#> n = 251 n = 1164
#> age
#> 23.0 (3.9) 23.4 (4.0)
#> marijuana
#> Yes 131 (52.2%) 584 (50.2%)
#> No 97 (38.6%) 434 (37.3%)
#> NA 23 (9.2%) 146 (12.5%)
#> illicit
#> Yes 23 (9.2%) 117 (10.1%)
#> No 205 (81.7%) 901 (77.4%)
#> NA 23 (9.2%) 146 (12.5%)
#> rehab
#> Yes 10 (4%) 37 (3.2%)
#> No 121 (48.2%) 547 (47%)
#> NA 120 (47.8%) 580 (49.8%)
#> ───────────────────────────────────
table1(nhanes_2010,
age, marijuana, illicit, rehab,
splitby=~asthma,
output = "text2",
na.rm = FALSE)
#>
#> ───────────────────────────────────
#> asthma
#> Yes No
#> n = 251 n = 1164
#> --------- ----------- -----------
#> age
#> 23.0 (3.9) 23.4 (4.0)
#> marijuana
#> Yes 131 (52.2%) 584 (50.2%)
#> No 97 (38.6%) 434 (37.3%)
#> NA 23 (9.2%) 146 (12.5%)
#> illicit
#> Yes 23 (9.2%) 117 (10.1%)
#> No 205 (81.7%) 901 (77.4%)
#> NA 23 (9.2%) 146 (12.5%)
#> rehab
#> Yes 10 (4%) 37 (3.2%)
#> No 121 (48.2%) 547 (47%)
#> NA 120 (47.8%) 580 (49.8%)
#> ───────────────────────────────────
library(tidyverse)
nhanes_2010 %>%
group_by(asthma) %>%
table1(age, marijuana, illicit, rehab,
output = "text2",
na.rm = FALSE)
#>
#> ───────────────────────────────────
#> asthma
#> Yes No
#> n = 251 n = 1164
#> --------- ----------- -----------
#> age
#> 23.0 (3.9) 23.4 (4.0)
#> marijuana
#> Yes 131 (52.2%) 584 (50.2%)
#> No 97 (38.6%) 434 (37.3%)
#> NA 23 (9.2%) 146 (12.5%)
#> illicit
#> Yes 23 (9.2%) 117 (10.1%)
#> No 205 (81.7%) 901 (77.4%)
#> NA 23 (9.2%) 146 (12.5%)
#> rehab
#> Yes 10 (4%) 37 (3.2%)
#> No 121 (48.2%) 547 (47%)
#> NA 120 (47.8%) 580 (49.8%)
#> ───────────────────────────────────
table1()
can do bivariate significance tests as well.
library(tidyverse)
nhanes_2010 %>%
group_by(asthma) %>%
table1(age, marijuana, illicit, rehab,
output = "text2",
na.rm = FALSE,
test = TRUE)
#>
#> ───────────────────────────────────────────
#> asthma
#> Yes No P-Value
#> n = 251 n = 1164
#> --------- ----------- ----------- -------
#> age 0.201
#> 23.0 (3.9) 23.4 (4.0)
#> marijuana 1
#> Yes 131 (52.2%) 584 (50.2%)
#> No 97 (38.6%) 434 (37.3%)
#> NA 23 (9.2%) 146 (12.5%)
#> illicit 0.623
#> Yes 23 (9.2%) 117 (10.1%)
#> No 205 (81.7%) 901 (77.4%)
#> NA 23 (9.2%) 146 (12.5%)
#> rehab 0.729
#> Yes 10 (4%) 37 (3.2%)
#> No 121 (48.2%) 547 (47%)
#> NA 120 (47.8%) 580 (49.8%)
#> ───────────────────────────────────────────
By default it does the appropriate parametric tests. However, you can
change that by setting param = FALSE
(new with v 1.9.1
).
library(tidyverse)
nhanes_2010 %>%
group_by(asthma) %>%
table1(age, marijuana, illicit, rehab,
output = "text2",
na.rm = FALSE,
test = TRUE,
param = FALSE,
type = "condense")
#>
#> ───────────────────────────────────────────────
#> asthma
#> Yes No P-Value
#> n = 251 n = 1164
#> ------------- ----------- ----------- -------
#> age 23.0 (3.9) 23.4 (4.0) 0.235
#> marijuana: No 97 (38.6%) 434 (37.3%) 1
#> illicit: No 205 (81.7%) 901 (77.4%) 0.623
#> rehab: No 121 (48.2%) 547 (47%) 0.729
#> ───────────────────────────────────────────────
It can also do a total column with the stratified columns (new with
v 1.9.0
) with the total = TRUE
argument.
nhanes_2010 %>%
group_by(asthma) %>%
table1(age, marijuana, illicit, rehab,
output = "text2",
na.rm = FALSE,
test = TRUE,
type = "condense",
total = TRUE)
#>
#> ────────────────────────────────────────────────────────────
#> asthma
#> Total Yes No P-Value
#> n = 1417 n = 251 n = 1164
#> ------------- ------------ ----------- ----------- -------
#> age 23.3 (4.0) 23.0 (3.9) 23.4 (4.0) 0.201
#> marijuana: No 532 (37.5%) 97 (38.6%) 434 (37.3%) 1
#> illicit: No 1107 (78.1%) 205 (81.7%) 901 (77.4%) 0.623
#> rehab: No 668 (47.1%) 121 (48.2%) 547 (47%) 0.729
#> ────────────────────────────────────────────────────────────
It can also report the statistics in addition to the p-values.
nhanes_2010 %>%
group_by(asthma) %>%
table1(age, marijuana, illicit, rehab,
output = "text2",
na.rm = FALSE,
test = TRUE,
total = TRUE,
type = "full")
#>
#> ─────────────────────────────────────────────────────────────────────────
#> asthma
#> Total Yes No Test P-Value
#> n = 1417 n = 251 n = 1164
#> --------- ------------ ----------- ----------- ---------------- -------
#> age T-Test: -1.28 0.201
#> 23.3 (4.0) 23.0 (3.9) 23.4 (4.0)
#> marijuana Chi Square: 0 1
#> Yes 716 (50.5%) 131 (52.2%) 584 (50.2%)
#> No 532 (37.5%) 97 (38.6%) 434 (37.3%)
#> NA 169 (11.9%) 23 (9.2%) 146 (12.5%)
#> illicit Chi Square: 0.24 0.623
#> Yes 141 (10%) 23 (9.2%) 117 (10.1%)
#> No 1107 (78.1%) 205 (81.7%) 901 (77.4%)
#> NA 169 (11.9%) 23 (9.2%) 146 (12.5%)
#> rehab Chi Square: 0.12 0.729
#> Yes 48 (3.4%) 10 (4%) 37 (3.2%)
#> No 668 (47.1%) 121 (48.2%) 547 (47%)
#> NA 701 (49.5%) 120 (47.8%) 580 (49.8%)
#> ─────────────────────────────────────────────────────────────────────────
table1()
can be outputted directly to other formats. All
knitr::kable()
options are available for this and there is an extra
option "latex2"
which provides a publication ready table in Latex
documents.
A new function table1_gt()
integrates the fantastic gt
package to
make beautiful HTML tables for table1.
nhanes_2010 %>%
group_by(asthma) %>%
table1(age, marijuana, illicit, rehab, na.rm = FALSE) %>%
table1_gt() %>%
print()
#> Using dplyr::group_by() groups: asthma
Characteristic | Yes, n = 251 | No, n = 1164 |
---|---|---|
age |
|
|
|
23.0 (3.9) |
23.4 (4.0) |
marijuana |
|
|
Yes |
131 (52.2%) |
584 (50.2%) |
No |
97 (38.6%) |
434 (37.3%) |
NA |
23 (9.2%) |
146 (12.5%) |
illicit |
|
|
Yes |
23 (9.2%) |
117 (10.1%) |
No |
205 (81.7%) |
901 (77.4%) |
NA |
23 (9.2%) |
146 (12.5%) |
rehab |
|
|
Yes |
10 (4%) |
37 (3.2%) |
No |
121 (48.2%) |
547 (47%) |
NA |
120 (47.8%) |
580 (49.8%) |
The tableC()
function gives a well-formatted correlation table.
tableC(nhanes_2010,
age, active, vig_active,
na.rm=TRUE)
#> N = 317
#> Note: pearson correlation (p-value).
#>
#> ──────────────────────────────────────────────────
#> [1] [2] [3]
#> [1]age 1.00
#> [2]active -0.148 (0.008) 1.00
#> [3]vig_active -0.083 (0.141) 0.828 (<.001) 1.00
#> ──────────────────────────────────────────────────
The tableF()
function gives a table of frequencies.
tableF(nhanes_2010, age)
#>
#> ──────────────────────────────────
#> age Freq CumFreq Percent CumPerc
#> 18 191 191 13.48% 13.48%
#> 19 153 344 10.80% 24.28%
#> 20 111 455 7.83% 32.11%
#> 21 95 550 6.70% 38.81%
#> 22 100 650 7.06% 45.87%
#> 23 112 762 7.90% 53.78%
#> 24 93 855 6.56% 60.34%
#> 25 100 955 7.06% 67.40%
#> 26 91 1046 6.42% 73.82%
#> 27 77 1123 5.43% 79.25%
#> 28 91 1214 6.42% 85.67%
#> 29 86 1300 6.07% 91.74%
#> 30 117 1417 8.26% 100.00%
#> ──────────────────────────────────
In addition, the rowmeans()
and rowsums()
functions offer a
simplified use of rowMeans()
and rowSums()
, particularly when using
the tidyverse’s mutate()
.
nhanes_2010 %>%
select(vig_active, mod_active) %>%
mutate(avg_active = rowmeans(vig_active, mod_active, na.rm=TRUE)) %>%
mutate(sum_active = rowsums(vig_active, mod_active, na.rm=TRUE)) %>%
head()
#> vig_active mod_active avg_active sum_active
#> 1 30 NA 30 30
#> 2 180 180 180 360
#> 3 NA NA NaN 0
#> 4 20 70 45 90
#> 5 120 NA 120 120
#> 6 NA NA NaN 0
The rowmeans.n()
and rowsums.n()
allow n
missing values while
still calculating the mean or sum.
df <- data.frame(
x = c(NA, 1:5),
y = c(1:5, NA)
)
df %>%
mutate(avg = rowmeans.n(x, y, n = 1))
#> x y avg
#> 1 NA 1 1.0
#> 2 1 2 1.5
#> 3 2 3 2.5
#> 4 3 4 3.5
#> 5 4 5 4.5
#> 6 5 NA 5.0
The package is most useful in conjunction with other tidy tools to get
data cleaned/tidied and start exploratory data analysis. I recommend
using packages such as library(dplyr)
, library(tidyr)
, and
library(ggplot2)
with library(furniture)
to accomplish this.
The original function–table1()
–is simply built for both exploratory
descriptive analysis and communication of findings. See vignettes or
tysonbarrett.com for several examples
of its use. Also see our paper in the R
Journal.