dfvis: An R repository from talegari

WIP: IN DEVELOPMENT

A ggplot2 based implementation of tabplot (github repo, paper)

tabplot offers a fast way to eyeball dataframes (my go-to tool over years). This uncovers possible interactions between variables when sorted by some variable. Hence, it builds intuition for any further modeling.

What is different from tabplot

Adds out-of-box support for grouped tibbles (tidy dataframes)
Based on ggplot for flexible geoms for different variable types
dfvis might not be as fast as tabplot

Illustrations

pacman::p_load("dplyr", "tabplot", "dfvis")

data("attrition", package = "modeldata")
attrition = as_tibble(attrition)
attrition_6 = attrition[, 1:6]
skimr::skim(attrition_6)

Data summary

Name	attrition_6
Number of rows	1470
Number of columns	6
_______________________
Column type frequency:
factor	3
numeric	3
________________________
Group variables	None

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
Attrition	1	FALSE	2	No: 1233, Yes: 237
BusinessTravel	1	FALSE	3	Tra: 1043, Tra: 277, Non: 150
Department	1	FALSE	3	Res: 961, Sal: 446, Hum: 63

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Age	1	36.92	9.14	18	30	36	43	60	▂▇▇▃▂
DailyRate	1	802.49	403.51	102	465	802	1157	1499	▇▇▇▇▇
DistanceFromHome	1	9.19	8.11	1	2	7	14	29	▇▅▂▂▂

Ungrouped case

autoplot(attrition_6, sort_column_name = "DistanceFromHome")

Grouped Case

suppressWarnings(
  attrition_6 %>% 
    group_by(Attrition) %>% 
    autoplot(sort_column_name = "DistanceFromHome")
  )

## Adding missing grouping variables: `Attrition`
## Adding missing grouping variables: `Attrition`

tabplot

tabplot::tableplot(attrition_6, sortCol = "DistanceFromHome", nBins = 10)

Development and Contribution

Contributions are welcome!
Create interactive version (with shiny?)

talegari/dfvis