/datasauRus

The Datasaurus Dozen datasets

Primary LanguageROtherNOASSERTION

datasauRus

Build Status

This package wraps the awesome Datasaurus Dozen dataset.

The Datasaurus was created by Alberto Cairo in this great blog post.

Datasaurus shows us why visualisation is important, not just summary statistics.

He's been subsequently made even more famous in the paper Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing by Justin Matejka and George Fitzmaurice.

In the paper, Jusitn and George simulate a variety of datasets that the same summary statistics to the Datasaurus but have very different distributions.

This package looks to make these datasets available for use as an advanced Anscombe's Quartet, available in R as anscombe.

Install

Currently, only available on GitHub, so use devtools to install the package

devtools::install_github("stephlocke/datasauRus")

Usage

You can use the package to produce Anscombe plots and more.

library(ggplot2)
library(datasauRus)
ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset))+
  geom_point()+
  theme_void()+
  theme(legend.position = "none")+
  facet_wrap(~dataset, ncol=3)

Tests

library(devtools)
test()
#> Loading datasauRus
#> Loading required package: testthat
#> Testing datasauRus
#> datasets: ......................
#> Raw files: .
#> 
#> DONE ======================================================================