This repository houses the PEPFAR MSD-style training dataset to use for testing and public facing work. This is a dummy dataset that should be used for testing, training, and demoing instead of using actual data.
You can download the dataset or read the dataset directly into R by using the commands below.
To use with R, you must have the readr
and ICPIutilities
package installed.
## IMPORT MASKED TRAINING DATASET
#install packages
install.packages("readr")
install.packages("devtools")
devtools::install_github("ICPI/ICPIutilities")
#import training dataset directly into R
#dataset location
dataset_url <- "https://media.githubusercontent.com/media/ICPI/TrainingDataset/master/Output/MER_Structured_TRAINING_Datasets_PSNU_IM_FY18-20_20200214_v1_1.txt"
#import with reader (will get some errors)
df <- readr::read_tsv(dataset_url)
#alternatively, you can use the read_msd() function from ICPIutilities (reads in all columns correctly)
df <- ICPIutilities::read_msd(dataset_url, save_rds = FALSE)
Users also have the options of building a masked dataset. To do so requires the users to have (1) the current PEPFAR MSD PSNUxIM and (2) supply a list of 15 PSNU UIDS. These PSNU UIDS will be used to filter the dataset to keep only those districts identified. For the list used to produce the official MSD Training dataset, you can contact ICPI/DIV.
## BUILD MASKED TRAINING DATASET
#install packages
install.packages("devtools")
devtools::install_github("ICPI/ICPIutilities")
#filepath for MSD (.txt)
msd_filepath <- "~/ICPI/Data/MER_Structured_Dataset_PSNU_IM_FY17-18_20181115_v1_2.txt"
#supply list of PSNU UIDs to use
#these are dummy PSNU UIDs; user must change
#to get the list used quarterly, contact ICPI/DIV
psnuuid_list <- c("QlWUt1rBsEo", "GfzVv4oeclF", "PNgOI7VPe98", "TWpkDEvK6si", "aqktQTp0wHa",
"XFdvhW8Ga1S", "sVwvt4bYesp", "EJDcY4F1rsj", "nXEQ97b2YHJ", "DfXQBOLWwbZ",
"aC69ENcI2hU", "PJpAerXQjZ7", "PAj75tgxkIU", "Wru5kJQ36GT", "HB75Phs4wZL")
#generate training dataset
mask_msd(msd_filepath, psnuuid_list)
#alternatively, save to a training dataset to a different folder than the MSD folder
mask_msd(msd_filepath, psnuuid_list, "~/Output")