waldronlab/cBioPortalData

Working example error

Closed this issue · 9 comments

Hi, I'm working through the cBioPortalData vignette and am having trouble getting this query to work:

cbio <- cBioPortal()
acc <- cBioPortalData(
    api = cbio,
    by = "hugoGeneSymbol",
    studyId = "acc_tcga",
    genePanelId = "IMPACT341",
    molecularProfileIds = c("acc_tcga_rppa", "acc_tcga_linear_CNA")
)

Here's the backtrace of the stuck R process:

Backtrace:
     █
  1. └─cBioPortalData::cBioPortalData(...)
  2.   ├─base::do.call(.portalExperiments, exargs)
  3.   └─(function (api, by, genePanelId, studyId, molecularProfileIds, ...
  4.     └─base::lapply(...)
  5.       └─cBioPortalData:::FUN(X[[i]], ...)
  6.         └─cBioPortalData::getDataByGenePanel(...)
  7.           └─cBioPortalData::molecularData(...)
  8.             └─cBioPortalData:::.invoke_bind(...)
  9.               └─cBioPortalData:::.bind_content(...)
 10.                 └─dplyr::bind_rows(httr::content(x))
 11.                   └─dplyr:::map(dots, function(.x) if (is.data.frame(.x)) .x else tibble(!!!.x))
 12.                     └─base::lapply(.x, .f, ...)
 13.                       └─dplyr:::FUN(X[[i]], ...)
 14.                         └─tibble::tibble(!!!.x)
 15.                           └─tibble:::tibble_quos(xs[!is.null], .rows, .name_repair)
 16.                             └─tibble:::splice_dfs(output)
 17.                               └─vctrs::vec_c(!!!x, .name_spec = "{inner}")

Here's the session info:

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       macOS Catalina 10.15.6      
 system   x86_64, darwin17.0          
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2020-07-27                  

─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version  date       lib source        
 AnnotationDbi          1.50.3   2020-07-25 [1] Bioconductor  
 AnVIL                * 1.0.3    2020-05-04 [1] Bioconductor  
 askpass                1.1      2019-01-13 [1] CRAN (R 4.0.0)
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.0.0)
 bb8                  * 0.2.17   2020-07-25 [1] local         
 Biobase              * 2.48.0   2020-04-27 [1] Bioconductor  
 BiocFileCache          1.12.0   2020-04-27 [1] Bioconductor  
 BiocGenerics         * 0.34.0   2020-04-27 [1] Bioconductor  
 BiocParallel           1.22.0   2020-04-27 [1] Bioconductor  
 biomaRt                2.44.1   2020-06-17 [1] Bioconductor  
 Biostrings             2.56.0   2020-04-27 [1] Bioconductor  
 bit                    1.1-15.2 2020-02-10 [1] CRAN (R 4.0.0)
 bit64                  0.9-7.1  2020-07-15 [1] CRAN (R 4.0.2)
 bitops                 1.0-6    2013-08-17 [1] CRAN (R 4.0.0)
 blob                   1.2.1    2020-01-20 [1] CRAN (R 4.0.0)
 cBioPortalData       * 2.0.7    2020-07-03 [1] Bioconductor  
 cli                    2.0.2    2020-02-28 [1] CRAN (R 4.0.0)
 crayon                 1.3.4    2017-09-16 [1] CRAN (R 4.0.0)
 curl                   4.3      2019-12-02 [1] CRAN (R 4.0.0)
 data.table             1.13.0   2020-07-24 [1] CRAN (R 4.0.2)
 DBI                    1.1.0    2019-12-15 [1] CRAN (R 4.0.0)
 dbplyr                 1.4.4    2020-05-27 [1] CRAN (R 4.0.0)
 DelayedArray         * 0.14.1   2020-07-14 [1] Bioconductor  
 digest                 0.6.25   2020-02-23 [1] CRAN (R 4.0.0)
 dplyr                * 1.0.0    2020-05-29 [1] CRAN (R 4.0.0)
 ellipsis               0.3.1    2020-05-15 [1] CRAN (R 4.0.0)
 fansi                  0.4.1    2020-01-08 [1] CRAN (R 4.0.0)
 formatR                1.7      2019-06-11 [1] CRAN (R 4.0.0)
 futile.logger          1.4.3    2016-07-10 [1] CRAN (R 4.0.0)
 futile.options         1.0.1    2018-04-20 [1] CRAN (R 4.0.0)
 generics               0.0.2    2018-11-29 [1] CRAN (R 4.0.0)
 GenomeInfoDb         * 1.24.2   2020-06-15 [1] Bioconductor  
 GenomeInfoDbData       1.2.3    2020-06-29 [1] Bioconductor  
 GenomicAlignments      1.24.0   2020-04-27 [1] Bioconductor  
 GenomicDataCommons     1.12.0   2020-04-27 [1] Bioconductor  
 GenomicFeatures        1.40.1   2020-07-08 [1] Bioconductor  
 GenomicRanges        * 1.40.0   2020-04-27 [1] Bioconductor  
 glue                   1.4.1    2020-05-13 [1] CRAN (R 4.0.0)
 hms                    0.5.3    2020-01-08 [1] CRAN (R 4.0.0)
 httr                   1.4.2    2020-07-20 [1] CRAN (R 4.0.2)
 IRanges              * 2.22.2   2020-05-21 [1] Bioconductor  
 jsonlite               1.7.0    2020-06-25 [1] CRAN (R 4.0.0)
 lambda.r               1.2.4    2019-09-18 [1] CRAN (R 4.0.0)
 lattice                0.20-41  2020-04-02 [2] CRAN (R 4.0.2)
 lifecycle              0.2.0    2020-03-06 [1] CRAN (R 4.0.0)
 limma                  3.44.3   2020-06-12 [1] Bioconductor  
 magrittr             * 1.5      2014-11-22 [1] CRAN (R 4.0.0)
 Matrix                 1.2-18   2019-11-27 [2] CRAN (R 4.0.2)
 matrixStats          * 0.56.0   2020-03-13 [1] CRAN (R 4.0.0)
 memoise                1.1.0    2017-04-21 [1] CRAN (R 4.0.0)
 MultiAssayExperiment * 1.14.0   2020-04-27 [1] Bioconductor  
 openssl                1.4.2    2020-06-27 [1] CRAN (R 4.0.2)
 packrat                0.5.0    2018-11-14 [1] CRAN (R 4.0.0)
 pillar                 1.4.6    2020-07-10 [1] CRAN (R 4.0.2)
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.0.0)
 prettyunits            1.1.1    2020-01-24 [1] CRAN (R 4.0.0)
 progress               1.2.2    2019-05-16 [1] CRAN (R 4.0.0)
 purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.0.0)
 R6                     2.4.1    2019-11-12 [1] CRAN (R 4.0.0)
 RaggedExperiment       1.12.0   2020-04-27 [1] Bioconductor  
 rapiclient             0.1.3    2020-01-17 [1] CRAN (R 4.0.2)
 rappdirs               0.3.1    2016-03-28 [1] CRAN (R 4.0.0)
 RCircos                1.2.1    2019-03-12 [1] CRAN (R 4.0.2)
 Rcpp                   1.0.5    2020-07-06 [1] CRAN (R 4.0.2)
 RCurl                  1.98-1.2 2020-04-18 [1] CRAN (R 4.0.0)
 readr                  1.3.1    2018-12-21 [1] CRAN (R 4.0.0)
 RJSONIO                1.3-1.4  2020-01-15 [1] CRAN (R 4.0.2)
 rlang                  0.4.7    2020-07-09 [1] CRAN (R 4.0.2)
 Rsamtools              2.4.0    2020-04-27 [1] Bioconductor  
 RSQLite                2.2.0    2020-01-07 [1] CRAN (R 4.0.0)
 rstudioapi             0.11     2020-02-07 [1] CRAN (R 4.0.0)
 RTCGAToolbox           2.18.0   2020-04-27 [1] Bioconductor  
 rtracklayer            1.48.0   2020-04-27 [1] Bioconductor  
 rvest                  0.3.6    2020-07-25 [1] CRAN (R 4.0.2)
 S4Vectors            * 0.26.1   2020-05-16 [1] Bioconductor  
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.0.0)
 stringi                1.4.6    2020-02-17 [1] CRAN (R 4.0.0)
 stringr                1.4.0    2019-02-10 [1] CRAN (R 4.0.0)
 SummarizedExperiment * 1.18.2   2020-07-09 [1] Bioconductor  
 survival               3.2-3    2020-06-13 [2] CRAN (R 4.0.0)
 TCGAutils              1.8.0    2020-04-27 [1] Bioconductor  
 tibble                 3.0.3    2020-07-10 [1] CRAN (R 4.0.2)
 tidyselect             1.1.0    2020-05-11 [1] CRAN (R 4.0.0)
 vctrs                  0.3.2    2020-07-15 [1] CRAN (R 4.0.2)
 withr                  2.2.0    2020-04-20 [1] CRAN (R 4.0.0)
 XML                    3.99-0.5 2020-07-23 [1] CRAN (R 4.0.2)
 xml2                   1.3.2    2020-04-23 [1] CRAN (R 4.0.0)
 XVector                0.28.0   2020-04-27 [1] Bioconductor  
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.0.0)
 zlibbioc               1.34.0   2020-04-27 [1] Bioconductor  

[1] /usr/local/koopa/opt/r/4.0/site-library
[2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Rerunning this step I was able to generate this error:

Error: Can't subset columns that don't exist.
✖ Column `clinicalAttributeId` doesn't exist.
Backtrace:
     █
  1. └─cBioPortalData::cBioPortalData(...)
  2.   ├─base::do.call(clinicalData, clinargs)
  3.   └─(function (api, studyId = NA_character_) ...
  4.     ├─tidyr::pivot_wider(...)
  5.     └─tidyr:::pivot_wider.data.frame(...)
  6.       └─tidyr::build_wider_spec(...)
  7.         └─tidyselect::eval_select(enquo(names_from), data)
  8.           └─tidyselect:::eval_select_impl(...)
  9.             ├─tidyselect:::with_subscript_errors(...)
 10.             │ ├─base::tryCatch(...)
 11.             │ │ └─base:::tryCatchList(expr, classes, parentenv, handlers)
 12.             │ │   └─base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
 13.             │ │     └─base:::doTryCatch(return(expr), name, parentenv, handler)
 14.             │ └─tidyselect:::instrument_base_errors(expr)
 15.             │   └─base::withCallingHandlers(...)
 16.             └─tidyselect:::vars_select_eval(...)
 17.               └─tidyselect:::as_indices_sel_impl(...)
 18.                 └─tidyselect:::as_indices_impl(x, vars, strict = strict)
 19.                   └─tidyselect:::chr_as_locations(x, vars)
 20.                     └─vctrs::vec_as_location(x, n = length(vars), names = vars)
 21.                       └─(function () ...
 22.                         └─vctrs:::stop_subscript_oob(...)
 23.                           └─vctrs:::stop_subscript(...)

@mjsteinbaugh
Hi Michael,
What does BiocManager::valid() give you?
Make sure it has the latest release installations.

> BiocManager::valid()
[1] TRUE

Also, I seem to be seeing some expected genes not return with this:

library(cBioPortalData)
gbm_tcga_pub2013 <- cBioDataPack("gbm_tcga_pub2013")
mat <- assay(gbm_tcga_pub2013, "RNA_Seq_v2_mRNA_median_Zscores")
hugo_genes %in% rownames(mat)
## [1]  TRUE  TRUE FALSE FALSE

I know the 2 FALSEs here should be TRUE here because they're on the cbioportal.org website and return with cgdsr package methods. Is there a potential gene symbol mapping issue here? I'm happy to help debug.

Hi Michael, @mjsteinbaugh
You may have old cache in your cache location.
Try clearing your cache using the unlink function call below.

unlink("~/.cache/cBioPortalData", recursive = TRUE)
suppressPackageStartupMessages(library(cBioPortalData))
cbio <- cBioPortal()
acc <- cBioPortalData(
    api = cbio,
    by = "hugoGeneSymbol",
    studyId = "acc_tcga",
    genePanelId = "IMPACT341",
    molecularProfileIds = c("acc_tcga_rppa", "acc_tcga_linear_CNA")
)
#> harmonizing input:
#>   removing 1 colData rownames not in sampleMap 'primary'
acc
#> A MultiAssayExperiment object of 2 listed
#>  experiments with user-defined names and respective classes.
#>  Containing an ExperimentList class object of length 2:
#>  [1] acc_tcga_rppa: SummarizedExperiment with 57 rows and 46 columns
#>  [2] acc_tcga_linear_CNA: SummarizedExperiment with 339 rows and 90 columns
#> Features:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DFrame
#>  sampleMap() - the sample availability DFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices
sessionInfo()
#> R version 4.0.2 Patched (2020-07-19 r78887)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] cBioPortalData_2.0.7        MultiAssayExperiment_1.14.0
#>  [3] SummarizedExperiment_1.18.2 DelayedArray_0.14.1        
#>  [5] matrixStats_0.56.0          Biobase_2.48.0             
#>  [7] GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
#>  [9] IRanges_2.22.2              S4Vectors_0.26.1           
#> [11] BiocGenerics_0.34.0         AnVIL_1.0.3                
#> [13] dplyr_1.0.0                
#> 
#> loaded via a namespace (and not attached):
#>  [1] httr_1.4.2                tidyr_1.1.0              
#>  [3] bit64_0.9-7.1             jsonlite_1.7.0           
#>  [5] splines_4.0.2             assertthat_0.2.1         
#>  [7] askpass_1.1               TCGAutils_1.8.0          
#>  [9] highr_0.8                 BiocFileCache_1.12.0     
#> [11] blob_1.2.1                Rsamtools_2.4.0          
#> [13] GenomeInfoDbData_1.2.3    RTCGAToolbox_2.18.0      
#> [15] progress_1.2.2            yaml_2.2.1               
#> [17] pillar_1.4.6              RSQLite_2.2.0            
#> [19] lattice_0.20-41           glue_1.4.1               
#> [21] limma_3.44.3              digest_0.6.25            
#> [23] XVector_0.28.0            rvest_0.3.6              
#> [25] htmltools_0.5.0           Matrix_1.2-18            
#> [27] XML_3.99-0.5              pkgconfig_2.0.3          
#> [29] biomaRt_2.44.1            zlibbioc_1.34.0          
#> [31] purrr_0.3.4               RCircos_1.2.1            
#> [33] rapiclient_0.1.3          BiocParallel_1.22.0      
#> [35] openssl_1.4.2             tibble_3.0.3             
#> [37] generics_0.0.2            ellipsis_0.3.1           
#> [39] GenomicFeatures_1.40.1    survival_3.2-3           
#> [41] RJSONIO_1.3-1.4           magrittr_1.5             
#> [43] crayon_1.3.4              memoise_1.1.0            
#> [45] evaluate_0.14             xml2_1.3.2               
#> [47] prettyunits_1.1.1         tools_4.0.2              
#> [49] data.table_1.13.0         hms_0.5.3                
#> [51] formatR_1.7               lifecycle_0.2.0          
#> [53] stringr_1.4.0             Biostrings_2.56.0        
#> [55] AnnotationDbi_1.50.3      lambda.r_1.2.4           
#> [57] compiler_4.0.2            rlang_0.4.7              
#> [59] futile.logger_1.4.3       grid_4.0.2               
#> [61] GenomicDataCommons_1.12.0 RCurl_1.98-1.2           
#> [63] rappdirs_0.3.1            bitops_1.0-6             
#> [65] rmarkdown_2.3             DBI_1.1.0                
#> [67] curl_4.3                  R6_2.4.1                 
#> [69] GenomicAlignments_1.24.0  rtracklayer_1.48.0       
#> [71] knitr_1.29                bit_1.1-15.2             
#> [73] futile.options_1.0.1      readr_1.3.1              
#> [75] stringi_1.4.6             RaggedExperiment_1.12.0  
#> [77] Rcpp_1.0.5                vctrs_0.3.2              
#> [79] dbplyr_1.4.4              tidyselect_1.1.0         
#> [81] xfun_0.16

Created on 2020-07-27 by the reprex package (v0.3.0)

@mjsteinbaugh I am not sure where you are getting hugo_genes from.
If you encounter any issues with the actual data provided from the cBioPortal tarballs,
go to the https://github.com/cbioportal/datahub and open an issue there

Not a cache issue as far as I can tell. The hugo_genes is a vector of genes of interest that I'd prefer to not post publicly at the moment.

I'll try running the vignette inside my Docker images and see if I can reprex. The error above may be a macOS-specific issue.

Hi Michael, @mjsteinbaugh
If there is an issue for Mac, feel free to open another issue.
Best,
Marcel