leeper/margins

Incorrect number of dimensions in margins::cplot if using data.table

AJFOWLER opened this issue · 1 comments

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

If you are reporting (1) a bug or (2) a question about code, please supply:

  • a fully reproducible example using a publicly available dataset (or provide your data)
  • if an error is occurring, include the output of traceback() run immediately after the error occurs
  • the output of sessionInfo()

I've found an odd bug when using data.table:

Put your code here:

## load package
library("margins")
library("data.table")

set.seed(100)

dfer = data.frame('age' = sample(0:100, 500, replace=T),
                  'disease' = rbinom(500,1, 0.3),
                  'dead' = rbinom(500,1,0.14), stringsAsFactors = F)

setDT(dfer)
# if data.table is removed as a class, then the function runs and returns as expected.

# create model
moder = glm(dead~disease*age, family = binomial, data = dfer)

# cplot
margins::cplot(moder, dx="age")


traceback()
3: lapply(dat[, names(dat) != xvar, drop = FALSE], mean_or_mode)
2: cplot.glm(moder, dx = "age")
1: margins::cplot(moder, dx = "age")

## session info for your system
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices
[4] utils     datasets  methods  
[7] base     

other attached packages:
[1] data.table_1.13.0   
[2] margins_0.3.23      
[3] comorbidgroupr_0.0.1
[4] testthat_2.3.2      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5       
 [2] prettyunits_1.1.1
 [3] ps_1.3.3         
 [4] assertthat_0.2.1 
 [5] rprojroot_1.3-2  
 [6] digest_0.6.25    
 [7] R6_2.4.1         
 [8] backports_1.1.7  
 [9] ggplot2_3.3.2    
[10] pillar_1.4.6     
[11] rlang_0.4.7      
[12] rstudioapi_0.11  
[13] callr_3.4.3      
[14] desc_1.2.0       
[15] devtools_2.3.1   
[16] stringr_1.4.0    
[17] munsell_0.5.0    
[18] compiler_4.0.2   
[19] xfun_0.16        
[20] pkgconfig_2.0.3  
[21] pkgbuild_1.1.0   
[22] tidyselect_1.1.0 
[23] tibble_3.0.3     
[24] roxygen2_7.1.1   
[25] fansi_0.4.1      
[26] crayon_1.3.4     
[27] dplyr_1.0.1      
[28] withr_2.2.0      
[29] MASS_7.3-51.6    
[30] grid_4.0.2       
[31] gtable_0.3.0     
[32] lifecycle_0.2.0  
[33] magrittr_1.5     
[34] scales_1.1.1     
[35] cli_2.0.2        
[36] stringi_1.4.6    
[37] fs_1.4.2         
[38] remotes_2.2.0    
[39] xml2_1.3.2       
[40] ellipsis_0.3.1   
[41] generics_0.0.2   
[42] vctrs_0.3.2      
[43] prediction_0.3.14
[44] tools_4.0.2      
[45] rcmdcheck_1.3.3  
[46] glue_1.4.1       
[47] purrr_0.3.4      
[48] processx_3.4.3   
[49] pkgload_1.1.0    
[50] colorspace_1.4-1 
[51] xopen_1.0.0      
[52] sessioninfo_1.1.1
[53] memoise_1.1.0    
[54] knitr_1.29       
[55] usethis_1.6.1  

I think this should be fixable either by modifying this lapply or perhaps by calling class(data) <- 'data.frame' near the top which seems to resolve this issue? Happy to open PR to help.

I had this same issue with my own dataset that I can't replicate here, but was fixed by wrapping the data.table object with as.data.frame(...) in the glm() call.

> class(df)
[1] "data.table" "data.frame"

# fails with cplot(x, "x1")
x <- glm(y ~ x1 + x2 + x2 + x3, data = df, family = binomial)

# works 
x <- glm(y ~ x1 + x2 + x2 + x3, data = as.data.frame(df), family = binomial)
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] margins_0.3.26 ggplot2_3.3.3  dplyr_1.0.5        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6         pillar_1.6.0       compiler_4.0.2     tools_4.0.2        digest_0.6.27     
 [6] bit_4.0.4          gtable_0.3.0       evaluate_0.14      lifecycle_1.0.0    tibble_3.1.1      
[11] pkgconfig_2.0.3    rlang_0.4.10       cli_2.5.0          DBI_1.1.0          rstudioapi_0.13   
[16] curl_4.3           yaml_2.2.1         xfun_0.22          withr_2.4.2        httr_1.4.2        
[21] knitr_1.33         generics_0.0.2     vctrs_0.3.8        geohashTools_0.3.1 hms_0.5.3         
[26] grid_4.0.2         bit64_4.0.5        tidyselect_1.1.0   fasttime_1.0-2     glue_1.4.2        
[31] data.table_1.13.0  R6_2.5.0           fansi_0.4.2        prediction_0.3.14  rmarkdown_2.7     
[36] farver_2.1.0       readr_1.3.1        purrr_0.3.4        magrittr_2.0.1     MASS_7.3-51.6     
[41] scales_1.1.1       ellipsis_0.3.2     htmltools_0.5.1.1  assertthat_0.2.1   colorspace_2.0-0  
[46] labeling_0.4.2     utf8_1.2.1         tinytex_0.31       munsell_0.5.0      RcppSimdJson_0.1.1
[51] crayon_1.4.1