This repository contains the code and files for the final project of DATA550. This particular data project focuses on analyzing the relationship between depression tests and PKD (Polycystic Kidney Disease) in a sample dataset.
To generate the final report, follow these steps:
- Ensure you have R and RStudio installed on your system.
- Clone this repository to your local machine.
- Open RStudio and set the working directory to the root of the cloned repository.
- Run the command
make final_report.html
in the terminal or console within RStudio.
The final report will be generated as final_report.html
.
To clean the output after generating the report, run the command make clean
in the terminal.
To build the Docker image and generate the report using Docker, follow these steps:
- Ensure Docker is installed on your system.
- Open a terminal and navigate to the root of the cloned repository.
- Run the command
make build
to build the Docker image. This process will install all required R packages and prepare the environment. This step may take a few minutes. - Once the image is built, run the command
make run
to execute the Docker container. This will generate the report and save it to thereport
directory on your local machine.
Variable (Type) Description
- PatientICN: (Integer) Unique sequence of numbers representing individual patient
- SurveyName: (Character) Depression test, has to be either PHQ9, PHQ-2, or PHQ-2+I9
- RawScore: (Integer) Score of depression survey taken (For a PHQ-2 and PHQ-2+I9 test must be between 0-6, for PHQ9 between 0-27)
- CurrentAge: (Integer) Age of patient currently
- IndexAge: (Integer) Age of patient when they took their first depression test
- SurveyGivenDateTime: (Date) Date of when the depression test was taken
- Gender: (Character) Male or female
- Race: (Character) Patient race (White, Black or African American, Asian, American Indian or Alaska Native, Native Hawaiian or Other Pacific Islander, Unknown)
- Ethnicity: (Character) Patient ethnicity (Hispanic or Latino, Not Hispanic or Latino, Unknown)
- Alcohol: (Integer) Alcohol abuse (1 if yes, 0 if no)
- Cancer: (Integer) Cancer (1 if yes, 0 if no)
- Diabetes: (Integer) Diabetes (1 if yes, 0 if no)
- Obesity: (Integer) Obese (1 if yes, 0 if no)
- NumeGFR: (Integer) Number of eGFR measurements in total the patient received
- Egfr.epi: (Integer) eGFR score at a given time
code/00_make_dataset.R
- generates example data with 7,000 rows and 15 columns. The data is longitudinal with multiple observations per person. The columns created are described above.
- saves data set as
data.rds
inoutput/
folder
code/01_make_table.R
- reads data set saved by
code/00_make_dataset.R
- creates labels for variables to present nicely in a table
- creates a table of the variables by survey name
- saves the table as
table_one.rds
inoutput/
folder
code/02_make_figure.R
- reads data set saved by
code/00_make_dataset.R
- uses ggplot to create four box plots of the counts of the four co-morbidities for the entire population
- creates one figure combining the four plots
- saves the figure as
figure.png
inoutput/
folder
code/final_report.Rmd
- loads the output into a nice report with descriptions of what you should see in the table and figure
code/04_render_report.R
- renders the report,
code/final_report.Rmd
in production mode
To synchronize the package repository and restore the package environment for this project, run the following command in R or RStudio in the Terminal:
make install