/BioVU

BioVU project

Primary LanguageR

BioVU Project

The goal of the project is to study the association between mtDNA haplogroups and delirium in sepsis patients.

Statistical modeling (regression and mediation analysis)

Input data

  • Haplogroup: Mito Delirium BioVU Data/Genetics/CAM_Haplogroups.xlsx
  • Date of death (DOD): Mito Delirium BioVU Data/Demographics/Date of Death data.xlsx (sheet FINAL DATE OF DEATH DATA)
  • All cohort subjects: datafile/sepsis_grids_20200106.xlsx (sheet All GRIDs)
  • Admission & discharge dates (include worst sofa score per admission): datafile/sepsis_compare_20191217.csv
  • Daily CAM status: datafile/daily_status_20190925.csv
  • Daily lab (includes sofa, rass, creatinine, platelet ...): datafile/daily_sofa_score_20191010.csv
  • Neuro damage data (remove encounters with "bad" icd codes):
    • ICD9: Mito Delirium BioVU Data/Neuro Exclusions/neuro_icd9_V2.xlsx
    • ICD10: Mito Delirium BioVU Data/Neuro Exclusions/neuro_icd10_V2.xlsx
  • Comorbidity score: Mito Delirium BioVU Data/Elixhauser Comorbidities/*.xlsx
  • Daily dementia: Mito Delirium BioVU Data/Dementia/Dementia.xlsx (sheet Initial Dementia Code date)
  • Medications
    • Mito Delirium BioVU Data/Data/grid_date_med1.csv
    • Mito Delirium BioVU Data/Data/grid_date_med2.csv
    • Mito Delirium BioVU Data/Data/grid_date_med3.csv
  • In ICU death manual review: datafile/in_ICU_death_manual_review.txt

Output data

  • Data dictionary: data_arxiv/analysis_daily_dict.xlsx
  • Daily-level data: data_arxiv/analysis_daily.rds
  • Encounter-level data: grid and adm_date in daily-level data uniquely determine an encounter
  • Subject-level data: grid in daily-level data uniquely determine a subject

Functions

  • TODO

Data Preprocessing

The initial steps have been done here:

  1. Determine daily status during hospitalization.
  2. Identify hospital encounters with at least one CAM-ICU assessments.
  3. Identify encounters with sepsis.

Table of Contents

Step 1 & 2. Determine daily status and identify CAM-ICU encounters

The code reconstruct_daily_visit_data.R does the following:

  1. Clean and combine CAM-ICU data and RASS data, resolves discrepancy
    • CAM-ICU is a tool to detect delirium in ICU patients, usually assesed every 8 hours in ICU. Valid CAM-ICU values:
      • Positive - Delirium present
      • Negative - No deliruim
      • UA - Patient in coma, cannot assess CAM-ICU score
      • Unk - This value is assigned in anlysis for conflicting CAM-ICU values at the same time point
    • RASS measures how awake and alert patient is, usually assesed every 4 hours in ICU. Obtaining a RASS score is the first step in administering CAM-ICU. Valid RASS scores are from -5 to 4.
      • If RASS score is -3 to 4, CAM-ICU is assessable and should be either Positive or Negative.
      • If RASS score is -5 or -4, patient is in coma, and CAM-ICU value should be UA.
    • CAM-ICU and RASS data was joined by GRID and assessment time.
  2. Determine daily status
    • For each day, the daily status is:
      • Delirious if any CAM-ICU value was Positive.
      • Otherwise Unknown: conflicting CAM if any CAM-ICU value was Unk.
      • Otherwise comatose if any RASS value was -5 or -4.
      • Otherwise Normal if any CAM-ICU value was Negative.
      • Otherwise Unknown: RASS only if all CAM-ICU values were missing and at least one non-missing RASS value.
      • Otherwise Unknown: No CAM nor RASS if all CAM-ICU and RASS values were missing.
    • Output Data: Girard_BioVU/output/daily_status_20190925.csv
  3. Identify encounters with at least one CAM-ICU assessment and get encounter/visit level summary
    • For each admission/discharge record, find all dates from admission date to discharge dates.
    • Merge with daily status obtained above (keep Comatose, Delirious, Normal only) and redefine admission/discharge dates
      • Consecutive dates were considered as one encounter, and the first/last dates of the group of consectuvie dates were taken as admission/discharge dates.
      • The reason we did all this was because:
        1. some admission/discharge record did not have a discharge date.
        2. around 20,000 hospital dates with daily status did not fall into any of the admission/discharge records and we do not want to throw them away.
    • Calculate a few summary statistics at encounter level and remove encounters without any CAM-ICU.
    • Output Data: Girard_BioVU/output/cam_stay_20190925.csv

Input data:

  • Girard_BioVU/output/data_raw.RData
    • Including admission/discharge data, CAM data, and RASS data
  • Girard_BioVU/output/changed_grid_dob_20190924.csv
    • Data used for correct GRIDs and dates

Report:

Refer to these reports for general ideas and more details. However, note that none of them correct for changed GRIDs.

  • Girard_BioVU/code/no_git/20190619_cam_gap.html
  • Girard_BioVU/code/no_git/20190319_daily_status.html
  • Girard_BioVU/code/no_git/20190716_visit_summary.html

Step 3. Identify sepsis

There are three ways to identify sepsis.

  1. Rhee definition (currently used)
  2. Sepsis-3 definition
  3. Sepsis ICD code

Rhee definition

  1. Identify CAM-ICU encounters that meet Rhee's presumed serious infection definition.

    • To find >= 4 QADs starting within 2 days of blood culture day:
      • Find whether an antibiotic was new, i.e., not given in the prior 2 calendar days.
      • Keep only new antibiotics given within 2 days of blood culture day, these are the starting dates of QADs.
      • Check whether there are 4 QADs counting from the starting dates.
        • For starting daysCalculate # of calender days,
    • Code: rhee_infection.R
    • Output Data: Girard_BioVU/output/rhee_infection_20191015.csv
  2. Among the presumed serious infections identified above, find which ones met Rhee's acute organ dysfunction definition.

Sepsis-3 definition

  1. Identify CAM-ICU encounters that meet Sepsis-3's suspected infection definition.
    • Code: sepsis3_infection.R
    • Output Data: Girard_BioVU/output/sepsis3_all_infections_20190927.csv
  2. calculate daily SOFA score for all CAM-ICU encounters.
    • Code: daily_sofa.R
    • Output Data: Girard_BioVU/output/daily_sofa_score_20191010.csv
  3. Among the suspected infections identified above, find which ones met Sepsis-3's organ dysfunction definition.
    • Code: sepsis3.R
    • Output Data: Girard_BioVU/output/sepsis3_20191014.csv

Compare three criteria

The code compare_sepsis.R does the following:

  1. Compare three criteria at encounter level.
    • Output Data: Girard_BioVU/output/sepsis_compare_20191217.csv
  2. Find distinct GRIDs with sepsis and see which ones have genotype data
    • Output Data:
      • Girard_BioVU/output/grid_not_in_genotype_status_20200106.csv
      • Girard_BioVU/output/sepsis_grids_20200106.xlsx
  3. Check the encounters with sepsis code but negative for both sepsis definitions
  4. We Decide to use Rhee definition only to identify sepsis for now.

Report

  • Girard_BioVU/code/20191120_sepsis_compare.html
    • Having missing data summary for Sepsis-3 definition.
  • Girard_BioVU/code/20200106_sepsis_compare.html
    • Most current version of comparing three criteria.

Misc.

Changed GRIDs

2000+ GRIDs were changed due to EHR system switching. Since the dates were shifted by different amount for each GRID, not only the GRIDs but also the dates need to be corrected. The code changed_grid_dob.R outputs the DOBs for old and updated GRIDs.
Supposedly, only the older data had the changed GRIDs problem. However, I recommend always check whether old GRIDs exist in any data used, adn follow the following two steps if old GRIDs do exist.

  1. Convert the old GRIDs to updated GRIDs.
  2. Convert all dates of the old GRIDs by date - old_dob + updated_dob.
  • Input Data:
    • Girard_BioVU/output/data_raw.RData
      • static_raw had all GRIDs in old EHR system and DOB.
    • Mito Delirium BioVU Data/Data/Changed_GRIDS.xlsx
      • Old and updated GRIDs only, no DOB.
    • Mito Delirium BioVU Data/Demographics/Set_*_20180830_demo.txt
      • GRID, primary GRID (if GRID was old and changed), and DOB for all GRIDs.
      • DOB discrepancy between this file and the other two sources.
    • Mito Delirium BioVU Data/Demographics/Sample_Genotyping_Status.xlsx
      • GRID and DOB.
  • Output Data:
    • Girard_BioVU/output/changed_grid_dob_20190924.csv
    • Girard_BioVU/output/dob_discrepancy.csv
      • discrepancy in DOB between Mito Delirium BioVU Data/Demographics/Set*20180830_demo.txt and other two sourcese for DOB, can ignore.

Check Respiratory Ratio

The code check_resp_ratio.R calculates respiration ratios for SOFA score. I believe we will get more respiration data in the future.

  1. Calculate PaO2/FiO2 and compare with already available ratio data.
    • Decide to use calculated PaO2/FiO2 instead of already available ratio data.
    • Input Data:
      • Mito Delirium BioVU Data/Lab values/PO2_FIO2_ratio/*.xlsx is the already available ratio data.
      • Mito Delirium BioVU Data/Lab values/FIO2/*.xlsx
      • Mito Delirium BioVU Data/Lab values/Arterial pO2/*.xlsx
    • Output Data: Girard_BioVU/output/pao2_fio2_ratio_calc_20190927.csv
  2. Check and correct FiO2 values
    • FiO2 is a fraction and should be 0-1.
    • Any FiO2 >= 100 was divided by 100.
    • Check FiO2 < 0.21 with Nasal O2 data.
  3. Calculate SpO2/FiO2
    • Input Data:
      • Mito Delirium BioVU Data/Lab values/FIO2/*.xlsx
      • Mito Delirium BioVU Data/Lab values/O2Sat/*.xlsx
    • Output Data: Girard_BioVU/output/spo2_fio2_ratio_calc_20191010.csv

Check Sepsis Discrepancy

The code check_sepsis_discrepancy.R checks why some encounters only met the Rhee definition but not the Sepsis-3 definition.

Check patient location

The code check_pt_loc.R tabulates patient location datato see whether it will help to identify whether they were in ICU. Decide not to use for now.

  • Input Data: Mito Delirium BioVU Data/Lab values/patient_Location/*.xlsx
  • Output Data: Girard_BioVU/output/patient_cam_visit_location_count.csv

Data Dictionary

Data dictionary can be found in the data_dict folder for currently in-use output data. They have the same name as the output data.