/datscience

A small set of statistic and data-visualization function usefull for data science

Primary LanguageHTMLOtherNOASSERTION

datscience R-Package

R-CMD-check Codecov test coverage Project Status: Active – The project has reached a stable, usable state and is being actively developed. license Last-changedate packageversion

The datscience (dataanalysis and science) R-package contains functions, which are frequently required in the process of preparing data for publication. Among those are:

  • format_flextable() applies the APA 7th theme to a flextable object
  • apa_factorLoadings() creates APA 7th based factor-loadings table
  • flex_table1() tabular group comparison of 2 or more groups (usualy table 1)
  • Rcitation_appendix() Function that allows the creation of a full APA-style table with all citations and informations on the R-packages utilized.

The overall goal was to improve the workflow of data analysis and help with formatting challenges I encountered in the preparation of submission to scientific journals. For example: getting stats from R with the right format into MS Word.

Installation

You can install the latest released version of datscience easily, directly from GitHub with:

# Normal intsallation
install.packages("devtools")
devtools::install_github("Buedenbender/datscience")

My recommendation would be to use a the pacman package manger instead, as this installs the latest version from github and directly loads it

# Recommendation: pacman
pacman::p_load_gh("Buedenbender/datscience")

Installation Troubleshoot

Some users might encounter: (System Error 267 @win/processx.c:1040), which is due to special characters in the username and the path to the directories (e.g., ö or é). In this case, you can try to install datscience with the remotes package in standalone mode in a fresh R-sessions (no packages loaded) (see below and for reference this stackoverflow posting).

Sys.setenv(R_REMOTES_STANDALONE="true")
remotes::install_github("Buedenbender/datscience")

A Teaser of datscience Functionality: flex_table1()

While R provides so many opportunities and power to conduct whatever analyses one can imagine, I found myself often having difficulties with the transfer of the analysis or the results from R session into MS Word.

Example Problem: Get a nicely formatted (in accordance with APA 7th publication manual) sociodemographic table 1 directly into a Word file (*.docx).

As of March 2022 the new function datscience::flex_table1() makes the creation of the Sociodemograhpic Table 1 (including the statistical comparisons of subsamples), basically a piece of cake. Take a look at the new article vignette("flex_table1"). For the example we took the popular iris dataset and included a simulated categorial variable called Color that contains either “Blue” or “Orange”. We supply the function with a formula that determines which variables are to be included in the table. For the example we include the two metric variables Sepal.Length, Sepal.Width as well as the simulated Color

# To load datscience
pacman::p_load_gh("Buedenbender/datscience")
str_formula <- "~ Sepal.Length + Sepal.Width + Color | Species"
flex_table1(str_formula, data = iris_sim, overall = "Overall") # |>
# save_flextable("Table1.docx")

Uncomment the pipe |> operator above and the line after the call to flex_table1() to diretly save this nicely formatted tabular comparison as .docs (Word) document.

Further Examples of datscience Functionality

Below are just a few examples of the functionality of the package

The apa_corrTable() Function

The datscience::apa_corrTable() displays correlations with marked significance and additionally adds descriptive statistics to the table, see below:

Screenshot of “CorrelationTable_iris.docx”

This function resolves around three other useful functions from this package.

  1. Creates the correlation table by calling datscience::corstars()[1]).

    datscience::corstars(iris[1:4])
    #>              Sepal.Length Sepal.Width Petal.Length
    #> Sepal.Length                                      
    #> Sepal.Width     -0.12                             
    #> Petal.Length     0.87***    -0.43***              
    #> Petal.Width      0.82***    -0.37***      0.96***
  2. Appends desired summary stats to the flextable.

  3. Formatting of the flextable::flextable() object to APA 7th style, by utilizing the format_flextable() function. To illustrate the function, we here use it to display the first 5 rows of the iris data set.

    datscience::format_flextable(flextable::flextable(head(iris, 5)),
      table_caption = c("Table 2", "Illustrating Functionality of format_flextable()")
    )

  4. Utilizing the datscience::save_flextable() function. This will savely (i.e., prohibiting overwrite of files by serializing the naming) write the flextable object to a Word (.docx) file

The format_flextable() Function

One of the most utilized functions inside the package is the datscience::format_flextable() which takes a flextable objects and applies the APA 7th edition theme on it. It also provides a work-around to give an APA ready table caption and a note.

Note: The code for the formatting (theme) for format_flextable() function was inspired from the blog post of Rémi Thériault

The flextable package is so versatile and it was exactly what I was looking for to get nicely formatted tables directly from R(studio) into Word. The same holds true for the datscience::format_flextable() function from the datscience package. It just applies some repetitive formatting necessary to convert a flextable to a “publication ready” APA formatted table.

One example of the flexibility would be to just try to print the factor loadings from a principal component analysis (PCA, psych::prinicial)

Let’s first create an exemplary PCA[1, 2] and extract the factor loadings:
(Which is also more conveniently packaged in the function apa_factorLoadings() function)

# Creation of an Example Prinicipal Component Analysis
pacman::p_load(psych, dplyr)
pc <- principal(Harman74.cor$cov, 4, rotate = "varimax")
pc_loadings <- pc$loadings |>
  fa.sort() |>
  round(3) |>
  unclass() |>
  as.data.frame() |>
  mutate(across(
    everything(),
    ~ if_else((. < 0.3), "", as.character(.))
  )) |>
  bind_cols(
    Communality = pc$communality,
    Uniqueness = pc$uniquenesses,
    Complexity = pc$complexity
  ) |>
  mutate(across(where(is.numeric), round, 2)) |>
  tibble::rownames_to_column("items")

Formatting these loadings to APA with just one function:

formatted_loadings <- datscience::format_flextable(flextable::flextable(pc_loadings),
  table_caption = c("Table 3", "Factor Loadings in Exemplary PCA")
)
formatted_loadings

Related Work

rempsyc I highly recommend this now available package. It offers great utility for psychologist and social scientist getting into R and provides nicely on-point tutorials on how to implement the provided functions (one of which was the origin of datscience::format_flextable())