/visStatistics

Visualization of the statistical hypothesis test between two or more groups of categorical or numerical data.

Primary LanguageHTMLOtherNOASSERTION

---
output: rmarkdown::html_document
editor_options: 
  markdown: 
    wrap: 72
---

<!-- README.md is automatically generated from README.Rmd. Please only edit this Rmd file! -->

<!-- knitr before every resubmission -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>",
  fig.width = 8,
  fig.height = 5,
  out.width = "100%",
  fig.path = "man/figures/README-"
)
```

# visStatistics

Visualization of the statistical hypothesis test with the highest statistical
power between two groups of categorical or numerical data.

The package visStatistics with its core function `visstat()` allows a fast visualization
and a reproducible statistical analysis of the presented data. Based on a decision tree, it selects the statistical hypothesis test
with the highest statistical power between the dependent variable
(response) `varsample` and the independent variable (feature)
`varfactor`. The corresponding test statistics, including any
post-hoc-analysis, are returned and a graph is generated showing the key statistics of the
test.

This fully automated workflow is particularly suited to browser-based interfaces to server-based deployments of R and data bases, and has been successfully implemented for unbiased statistical analysis of medical data sets.

A detailed description of the package the and its underlying
decision tree, can be found in the `vignette` accompanying this package.

## Implemented tests

`lm()`, `t.test()`, `wilcox.test()`, `aov()`, `kruskal.test()`,
`fisher.test()`, `chisqu.test()`

### Implemented tests to check the normal distribution of standardized residuals

`shapiro.test()` and `ad.test()`

### Implemented post-hoc tests

`TukeyHSD()` for `aov()`and `pairwise.wilcox.test()` for
`kruskal.test()`

## Installation latest stable version from CRAN

1.  Install the package `install.packages("visStatistics")`
2.  Load the package `library(visStatistics)`

## Installation of developing version from GitHub

1.  Install the devtools package from CRAN. Invoke R and type
    `install.packages("devtools")`
2.  Load the devtools package. `library(devtools)`
3.  Install the package from the github-repository
    `install_github("shhschilling/visStatistics")`
4.  Load the package `library(visStatistics)`
5.  Help on the function usage `?visstat`

## Getting Started

The package vignette allows you to get familiar with all features of
`visStatistics`. It documents in detail the algorithm of the decision
tree illustrated by below examples.

## Examples

```{r example}
library(visStatistics)
```

### Welch's t-test

#### InsectSprays data set

```{r}
insect_sprays_a_b <- 
  InsectSprays[which(InsectSprays$spray == "A" | InsectSprays$spray == "B"), ]
insect_sprays_a_b$spray <- factor(insect_sprays_a_b$spray)
visstat(insect_sprays_a_b, "count", "spray")
```

#### mtcars data set

```{r}
```

```{r}
mtcars$am <- as.factor(mtcars$am)
t_test_statistics <- visstat(mtcars, "mpg", "am")
```

### Wilcoxon rank sum test

```{r}
grades_gender <- data.frame(
  sex = as.factor(c(rep("girl", 21), rep("boy", 23))),
  grade = c(
    19.3, 18.1, 15.2, 18.3, 7.9, 6.2, 19.4,
    20.3, 9.3, 11.3, 18.2, 17.5, 10.2, 20.1, 13.3, 17.2, 15.1, 16.2, 17.0,
    16.5, 5.1, 15.3, 17.1, 14.8, 15.4, 14.4, 7.5, 15.5, 6.0, 17.4,7.3, 14.3, 
    13.5, 8.0, 19.5, 13.4, 17.9, 17.7, 16.4, 15.6, 17.3, 19.9, 4.4, 2.1
  )
)

wilcoxon_statistics <- visstat(grades_gender, "grade", "sex")
```

### ANOVA 

```{r}
insect_sprays_tr <- InsectSprays
insect_sprays_tr$count_sqrt <- sqrt(InsectSprays$count)
visstat(insect_sprays_tr, "count_sqrt", "spray")
```


### One-way test

```{r}
one_way_npk <- visstat(npk, "yield", "block")
```

### Kruskal-Wallis test

The generated graphs can be saved in all available formats of the
`Cairo` package. Here we save the graphical output of type "pdf" in the
`plotDirectory` `tempdir()`:

```{r}
visstat(iris, "Petal.Width", "Species", 
        graphicsoutput = "pdf", plotDirectory = tempdir())
```

### Linear Regression

```{r}
linreg_cars <- visstat(cars, "dist", "speed")
```

Increasing the confidence level `conf.level` from the default 0.95 to
0.99 leads two wider confidence and prediction bands:

```{r pressure, echo = FALSE}
linreg_cars_99 <- visstat(cars, "dist", "speed", conf.level = 0.99)
```

### Pearson's Chi-squared test

Count data sets are often presented as multidimensional arrays,
so-called contingency tables, whereas `visstat()` requires a
`data.frame` with a column structure. Arrays can be transformed to this
column wise structure with the helper function `counts_to_cases()`:

```{r}
hair_eye_color_df <- counts_to_cases(as.data.frame(HairEyeColor))
visstat(hair_eye_color_df, "Hair", "Eye")
```

### Fisher's exact test

```{r}
hair_eye_color_male <- HairEyeColor[, , 1]
# Slice out a 2 by 2 contingency table
black_brown_hazel_green_male <- hair_eye_color_male[1:2, 3:4]
#Transform to data frame
black_brown_hazel_green_male <- counts_to_cases(as.data.frame(black_brown_hazel_green_male))
# Fisher test
fisher_stats <- visstat(black_brown_hazel_green_male, "Hair", "Eye")
```