/TMPM

Trauma Mortality Prediction Model

Primary LanguageRGNU General Public License v3.0GPL-3.0

TMPM2

R-CMD-check

The goal of TMPM2 is to provide an update to the R package tmpm: Trauma Mortality Prediction Model.

The core function of the package, tmpm2, calculates probability of death (pDeath) based on the Trauma Mortality Prediction Model created by Turner Osler, MD, MSc and Laurent Glance, MD. tmpm2 uses injuries recorded in AIS, ICD-9-CM, or ICD-10-CM and computes the probability of death for each patient in the dataset. tmpm2 can accommodate datasets arranged in wide format (one row per patient with one or more injuries per row) or the long format (one or more rows per patient, with one injury per row).

TMPM2 offers several improvements over its predecessor. Most notably, TMPM2 offers more accurate calculations of pDeath by using equations that are specific to each diagnosis lexicon (AIS, ICD-9-CM, and ICD-10-CM). In contrast, tmpm relies on approximate crosswalks of diagnosis codes so that the same set of equations can be used for all three diagnosis lexicons. See the Details section in help("tmpm2", TMPM2) for more information. Additionally, by utilizing vectorized code, TMPM2 is much faster than tmpm for datasets of even moderate size. Finally, and more subjectively, TMPM2 offers ease-of-use improvements over tmpm by: automatically detecting the diagnosis lexicon in use, being more flexible in the structure of the dataset, and having easier to understand syntax and argument names.

Installation

You can install the development version of TMPM2 from GitHub with:

# install.packages("devtools")
devtools::install_github("jrf1111/TMPM")

Example

This is a basic example that shows you how to use the package:

library(TMPM2)

test = tibble::tribble(
~id , ~dx, 
1, "S36114A",
1, "S27321A", 
1, "S301XXA",
1, "S40022A",
1, "S7001XA", 
2, "S20212A", 
2, "S301XXA", 
2, "S42032A", 
3, "S82031A", 
4, "S72012A")

tmpm2(data = test, id = id, long = TRUE)
#> 
#> icd10 diagnosis codes detected
#> # A tibble: 10 × 2
#>       id  pDeath
#>    <dbl>   <dbl>
#>  1     1 0.0178 
#>  2     1 0.0178 
#>  3     1 0.0178 
#>  4     1 0.0178 
#>  5     1 0.0178 
#>  6     2 0.0128 
#>  7     2 0.0128 
#>  8     2 0.0128 
#>  9     3 0.00186
#> 10     4 0.0208

Speed comparison

TMPM2 is much faster than tmpm for datasets of even moderate size.

Click to see code
library(tmpm)
#> Loading required package: reshape2
library(TMPM2)
suppressPackageStartupMessages(library(tidyverse))
ns = c(50, 100, 200, 500, 1000) #number of "patients"
reps = 20 #number of replications

results = data.frame(
    n = rep(ns, each = reps),
    tmpm = NA,
    TMPM2 = NA
)




for(i in 1:nrow(results)){

    #make some fake ICD-9 injury diagnosis data
    set.seed(i)
    dat = data.frame(
        replicate(
            n = 10, #10 columns of diagnoses
            round(runif(n = results$n[i], #number of "patients"
                                    min = 800, max = 959.9), 
                        digits = 2)))
    
    dat$ID = 1:nrow(dat)
    #put ID column first (required for tmpm)
    dat = dat[, c("ID", paste0("X", 1:10) )]
    
    
    results$tmpm[i] = system.time(tmpm(Pdat = dat, ILex = 9, Long = FALSE))["elapsed"]
    results$TMPM2[i] = system.time(tmpm2(data = dat, id = ID, lex = "icd9", 
long = FALSE, legacy = TRUE))["elapsed"]
    
}
tab = results %>% 
    group_by(n) %>% 
    summarise(
        `tmpm duration` = median(tmpm),
        `TMPM2 duration` = median(TMPM2),
        Ratio = median(tmpm)/median(TMPM2)
    ) %>% 
    knitr::kable(digits = 2,
                             caption = "Median durations in seconds",
                             format = "html"
                             )

p = results %>% 
    pivot_longer(
        cols = -n
    ) %>% 
    ggplot(aes(x = as.factor(n), y = value, color = name)) +
    geom_boxplot(size = 0.3) +
    scale_y_continuous(limits = c(0, NA))+
    labs(
        title = "Speed comparison between tmpm and TMPM2",
        caption = paste("Timings based on", reps, "replications per sample size"),
        x = "Sample size",
        y = "Time (seconds)",
        color = "Method"
    )

Median durations in seconds
n tmpm duration TMPM2 duration Ratio
50 0.07 0.08 0.87
100 0.12 0.08 1.61
200 0.38 0.08 4.55
500 0.59 0.09 6.59
1000 3.76 0.21 18.21
all.equal(
    tmpm(Pdat = dat, ILex = 9, Long = FALSE)[, "pDeath"], 
    tmpm2(data = dat, id = ID, lex = "icd9", 
                long = FALSE, legacy = TRUE)[, "pDeath"]
)
#> Initializing ICD-9 Mortality Model Prediction
#> Mortality Model Prediction Complete
#> [1] TRUE

benchmarkme::get_cpu()
#> $vendor_id
#> [1] "GenuineIntel"
#> 
#> $model_name
#> [1] "Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz"
#> 
#> $no_of_cores
#> [1] 8
benchmarkme::get_ram()
#> 17.2 GB
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] forcats_0.5.1    stringr_1.4.0    dplyr_1.0.9      purrr_0.3.4     
#>  [5] readr_2.1.2      tidyr_1.2.0      tibble_3.1.8     ggplot2_3.3.6   
#>  [9] tidyverse_1.3.2  tmpm_1.0.3       reshape2_1.4.4   TMPM2_0.0.0.9100
#> 
#> loaded via a namespace (and not attached):
#>  [1] httr_1.4.3            jsonlite_1.8.0        foreach_1.5.2        
#>  [4] modelr_0.1.8          assertthat_0.2.1      highr_0.9            
#>  [7] googlesheets4_1.0.0   cellranger_1.1.0      yaml_2.3.5           
#> [10] pillar_1.8.0          backports_1.4.1       lattice_0.20-45      
#> [13] glue_1.6.2            digest_0.6.29         rvest_1.0.2          
#> [16] colorspace_2.0-3      htmltools_0.5.2       Matrix_1.4-1         
#> [19] plyr_1.8.7            pkgconfig_2.0.3       broom_1.0.0          
#> [22] haven_2.5.0           scales_1.2.0          tzdb_0.3.0           
#> [25] googledrive_2.0.0     generics_0.1.3        farver_2.1.1         
#> [28] tidytable_0.8.0       ellipsis_0.3.2        withr_2.5.0          
#> [31] cli_3.3.0             magrittr_2.0.3        crayon_1.5.1         
#> [34] readxl_1.4.0          evaluate_0.15         fs_1.5.2             
#> [37] fansi_1.0.3           doParallel_1.0.17     xml2_1.3.3           
#> [40] benchmarkme_1.0.8     tools_4.2.1           data.table_1.14.2    
#> [43] hms_1.1.1             gargle_1.2.0          lifecycle_1.0.1      
#> [46] munsell_0.5.0         reprex_2.0.1          compiler_4.2.1       
#> [49] rlang_1.0.4           grid_4.2.1            iterators_1.0.14     
#> [52] rstudioapi_0.13       labeling_0.4.2        rmarkdown_2.14       
#> [55] gtable_0.3.0          codetools_0.2-18      DBI_1.1.3            
#> [58] benchmarkmeData_1.0.4 R6_2.5.1              lubridate_1.8.0      
#> [61] knitr_1.39            fastmap_1.1.0         utf8_1.2.2           
#> [64] stringi_1.7.8         parallel_4.2.1        Rcpp_1.0.9           
#> [67] vctrs_0.4.1           dbplyr_2.2.1          tidyselect_1.1.2     
#> [70] xfun_0.31