Example Dynamic Communication of Biomedical Data

An example approach to dynamically communicating biomedical data.

Objective

Create a web page that both summarizes the composition of and highlights the outliers (interesting cases) of a cohort.

Desired Elements

The checklist of desired elements includes annotations below:

Notes

Data cleaning is not included in the main dashboard code. Instead, it is implemented through the scripts in the 0b_data_exploration_and_cleaning directory, which are meant to be run in order on a pre-caching computer.
No values (aside from column names) are hard-coded in the codebase.

Use

The instructions below assume that R and RStudio are installed.

The instructions below have been tested using RStudio for Linux version 1.1.442 and RStudio for Mac OSX version 1.0.153.

Open RStudio.
Within RStudio, click File --> "Open Project..."
Select 2018-06_shiny_example_biomedical_data.Rproj.
RStudio will now pause as it installs packages. This installation process only needs to happen once.
RStudio will present output as it completes these tasks (for example, Installing rstudioapi...).
Some of the installation steps (for example, for the BH, stringi, and dplyr packages) may take a minute or more.
If RStudio offers a y/n prompt, type "y" and press Enter.
When RStudio is finished, it will print the message, "Packrat bootstrap successfully completed." Once this happens, R will restart, and you will be able to run any command in the console (for example, at this point, if you type 2+2 next to the > prompt in the "Console" pane and press Enter, R should return 4).
Once RStudio is finished installing packages above:
1. Close RStudio (no need to save .Rdata when prompted), then reopen it and open the project again (following the initial steps above).
  Closing and re-opening RStudio will cause RStudio to recognize the packages it installed above to the project directory.
2. Within RStudio, open 1_application/ui.R.
3. Within RStudio, click "Session" --> "Set Working Directory" --> "To Source File Location"
4. Click "Run App" in the top right corner of the RStudio editor pane.
5. RStudio may ask to install an updated version of the shiny package. Click "Yes".
6. The app should launch in a web browser. If it launches in an internal browser, you can click "Open in Browser" in the top left corner of the internal viewer window.

Additional Resources

t-SNE

Vega

I initially put quite a bit of work into implementing the main t-SNE scatterplot in Vega or Vega-Lite.

After substantial research, however, I concluded that Vega does not yet have a robust (or any) API for selections. Selections (e.g., brushing / click-and-drag selection of scatterplot points) is possible in Vega, but publishing that back to R, e.g., does not seem feasible yet. See:

The GitHub Issue "APIs to interact with Selection's Data and Signals", which is open.
The Vega view API, which seems promising for this purpose, but is not yet well-documented enough to use, and almost wholly undocumented in the secondary landscape of StackExchange, and blogs.
- I do have an open StackOverflow Question about this topic.

ICD-9 Codes

ICD 10 codes structure

Issues / Places for Further Development

data_subset() is an event that launches both on load and then one second after load, causing load time to be longer. This could be optimized.
As mentioned above, currently, ggplot2 is used in place of vega, as the latter took substantially longer to render on my development laptop.
The emphasis on visual inspection of the t-SNE scatterplot means that this interface is not particularly (or perhaps at all) accessible to clinicians and researchers who have limited vision. Researching and developing approaches to convey the t-SNE output (and scatterplot output more generally) to users through a screen reader could be a fruitful future step. This could also be accomplished, e.g., by running the t-SNE output through a clustering algorithm such as k-means, and then summarizing the output of that. However, t-SNE output cannot always be cluster-analyzed, because the t-SNE algorithm does not preserve distance.
This development involved a lot of research into interface possibilities with Shiny. Now that the interface design has settled, in a real-world scenario, I would focus on adding both unit tests and functional tests (the latter, for example, through the new Shinytest package).
- Further, there are several internal functions that, ideally, should receive full ROxygen-based documentation.
While the app is running, I have allowed ggplot2 to produce a warning that notes when there are data rows that are left out of visualizations because of incomplete data. I decided to leave this in (rather than adding an na.omit() line in the code) to improve initial diagnostics as the app runs, pointing out the number of rows that remain with missing data even after data cleaning. This warning could be turned off / the missing data removed more gracefully in the future.
To facilitate development, this app uses the tidyverse package, which is a wrapper for loading several related packages. In the future, this app could be made more lean by replacing the call to the tidyverse package with a call to specific packages within that wrapper package.

Style Guide

The code in this repository follows the Tidyverse Style Guide, with occasional additional guidelines (specifically, the use of ## rather than # to delimit text comments) taken from the Google R Style Guide.

jglev/2018-06_shiny_example_biomedical_data