/dfvis

ggplot2 based implementation of tabplot

Primary LanguageR

WIP: IN DEVELOPMENT

A ggplot2 based implementation of tabplot (github repo, paper)

tabplot offers a fast way to eyeball dataframes (my go-to tool over years). This uncovers possible interactions between variables when sorted by some variable. Hence, it builds intuition for any further modeling.

What is different from tabplot

  • Adds out-of-box support for grouped tibbles (tidy dataframes)
  • Based on ggplot for flexible geoms for different variable types
  • dfvis might not be as fast as tabplot

Illustrations

pacman::p_load("dplyr", "tabplot", "dfvis")

data("attrition", package = "modeldata")
attrition = as_tibble(attrition)
attrition_6 = attrition[, 1:6]
skimr::skim(attrition_6)
Data summary
Name attrition_6
Number of rows 1470
Number of columns 6
_______________________
Column type frequency:
factor 3
numeric 3
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Attrition 0 1 FALSE 2 No: 1233, Yes: 237
BusinessTravel 0 1 FALSE 3 Tra: 1043, Tra: 277, Non: 150
Department 0 1 FALSE 3 Res: 961, Sal: 446, Hum: 63

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Age 0 1 36.92 9.14 18 30 36 43 60 ▂▇▇▃▂
DailyRate 0 1 802.49 403.51 102 465 802 1157 1499 ▇▇▇▇▇
DistanceFromHome 0 1 9.19 8.11 1 2 7 14 29 ▇▅▂▂▂

Ungrouped case

autoplot(attrition_6, sort_column_name = "DistanceFromHome")

Grouped Case

suppressWarnings(
  attrition_6 %>% 
    group_by(Attrition) %>% 
    autoplot(sort_column_name = "DistanceFromHome")
  )

## Adding missing grouping variables: `Attrition`
## Adding missing grouping variables: `Attrition`

tabplot

tabplot::tableplot(attrition_6, sortCol = "DistanceFromHome", nBins = 10)

Development and Contribution

  • Contributions are welcome!
  • Create interactive version (with shiny?)