/beiwe_missing_data

Sociodemographic Characteristics of Missing Data in Digital Phenotyping

Primary LanguageHTML

Sociodemographic characteristics of missing data in digital phenotyping

Introduction

Analytic code, model results, and documentation for our Scientific Reports paper, Sociodemographic Characteristics of Missing Data in Digital Phenotyping. The full citation is:

Kiang MV, Chen JT, Krieger N, Buckee CO, Alexander MJ, Baker JT, Buckner RL, Coombs III G, Rich-Edwards JW, Carlson KW, and Onnela JP. Sociodemographic characteristics of missing data in digital phenotyping. Scientific Reports (July 2021). doi: 10.1038/s41598-021-93687-7

A note about reproducibility

As we note in our Supplementary Information, the data used in these analyses were metadata and contained only timestamps (e.g., the date and time of a GPS ping but not the coordinates); however, the timestamps of participants can be considered personally identifiable information. Therefore, to minimize the potential for participant harm and re-identification, the data are not shared publicly. Data available upon request, contingent upon appropriate IRB approvals or exemptions from participating institutions. While not the raw data, these data will provide sufficient information to reproduce our results (e.g., using shifted and/or adding noise to timestamps, re-randomized user identifiers).

In addition, we provide example replication code, along with documentation, in this online repository. The code and documentation are near exact copies of the code used in this project with only minor differences. Specifically, for this paper, we use internal study project names which may include a year and/or month. Out of an abundance of caution, we remove any references to these study names. However, the code is otherwise the same. See the documentation for more information.

Preprint

This paper originally appeared as a preprint on medRxiv (doi: 10.1101/2020.12.29.20249002v1). The code affiliated with this preprint can be found at this commit.

Structure

  • code: Contains code files to be run in sequential order. See documentation for details.
  • data_raw (not on Github): Contain raw data collected using the Beiwe Research Platform.
  • data_stripped (not on Github): Contain summarized data collected using the Beiwe Research Platform.
  • data_working (not on Github): Contain working data used for plots and analysis.
  • model_objects (not on Github): Contain the RStan/brms model objects after fitting.
  • output: Contains all plots, tables, and relevant supplementary information.
  • rmds: Contains the source (i.e., rmarkdown) files for supplementary information.

The config.yml file allows you to change the path of the data files above as well as specify the number of cores to use in your computing environment.

Supplementary information