hi_loadings() out-of-bounds error

Question

hi_loadings() out-of-bounds error

fruce-ki opened this issue 5 years ago · 4 comments

I get the following when trying to extract the top N genes from a primary components object prepared with pcomp():

Error in exprTable[names(geneloadings_extreme), ] : 
   subscript out of bounds

The error happens only for a subset of the primary components and can be resolved by lowering the number N passed to the topN parameter (I'm asking for just 25 genes out of 500 used in the PCA, but is some case that's already too many apparently).

I suspect that hi_loadings() does some internal thresholding, and that in some cases where a component's contribution is low, the genes that pass the criteria are fewer than topN, resulting in the error. If that is the case, a more useful behaviour would be to return the genes that do pass the criteria, with a warning that fewer than N genes were returned.

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] htmltools_0.3.6             ggdendro_0.1-20             plotly_4.9.0                ggrepel_0.8.1               gridExtra_2.3              
 [6] forcats_0.4.0               stringr_1.4.0               dplyr_0.8.3                 purrr_0.3.2                 readr_1.3.1                
[11] tidyr_0.8.3                 tibble_2.1.3                ggplot2_3.2.0               tidyverse_1.2.1             pcaExplorer_2.8.1          
[16] topGO_2.34.0                SparseM_1.77                GO.db_3.7.0                 AnnotationDbi_1.44.0        graph_1.60.0               
[21] DESeq2_1.22.2               SummarizedExperiment_1.12.0 DelayedArray_0.8.0          BiocParallel_1.16.6         Biobase_2.42.0             
[26] GenomicRanges_1.34.0        GenomeInfoDb_1.18.2         IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0        
[31] matrixStats_0.54.0          data.table_1.12.2          

loaded via a namespace (and not attached):
  [1] readxl_1.3.1           backports_1.1.4        GOstats_2.48.0         Hmisc_4.2-0            NMF_0.21.0             plyr_1.8.4            
  [7] igraph_1.2.4.1         lazyeval_0.2.2         GSEABase_1.44.0        shinydashboard_0.7.1   splines_3.5.1          crosstalk_1.0.0       
 [13] gridBase_0.4-7         digest_0.6.20          foreach_1.4.7          magrittr_1.5           checkmate_1.9.4        memoise_1.1.0         
 [19] cluster_2.0.7-1        doParallel_1.0.14      limma_3.38.3           annotate_1.60.1        modelr_0.1.4           prettyunits_1.0.2     
 [25] colorspace_1.4-1       apeglm_1.4.2           rvest_0.3.4            blob_1.2.0             haven_2.1.1            xfun_0.8              
 [31] jsonlite_1.6           crayon_1.3.4           RCurl_1.95-4.12        genefilter_1.64.0      zeallot_0.1.0          survival_2.42-3       
 [37] iterators_1.0.12       glue_1.3.1             registry_0.5-1         gtable_0.3.0           zlibbioc_1.28.0        XVector_0.22.0        
 [43] Rgraphviz_2.26.0       scales_1.0.0           pheatmap_1.0.12        DBI_1.0.0              rngtools_1.4           bibtex_0.4.2          
 [49] Rcpp_1.0.2             emdbook_1.3.11         viridisLite_0.3.0      xtable_1.8-4           progress_1.2.2         htmlTable_1.13.1      
 [55] foreign_0.8-70         bit_1.1-14             Formula_1.2-3          DT_0.7                 AnnotationForge_1.24.0 htmlwidgets_1.3       
 [61] httr_1.4.0             threejs_0.3.1          RColorBrewer_1.1-2     shinyAce_0.4.0         acepack_1.4.1          pkgconfig_2.0.2       
 [67] XML_3.98-1.20          nnet_7.3-12            locfit_1.5-9.1         labeling_0.3           tidyselect_0.2.5       rlang_0.4.0           
 [73] reshape2_1.4.3         later_0.8.0            cellranger_1.1.0       munsell_0.5.0          tools_3.5.1            cli_1.1.0             
 [79] generics_0.0.2         RSQLite_2.1.2          broom_0.5.2            evaluate_0.14          shinyBS_0.61           yaml_2.2.0            
 [85] knitr_1.23             bit64_0.9-7            RBGL_1.58.2            nlme_3.1-137           mime_0.7               xml2_1.2.0            
 [91] biomaRt_2.38.0         compiler_3.5.1         rstudioapi_0.10        png_0.1-7              geneplotter_1.60.0     stringi_1.4.3         
 [97] lattice_0.20-35        Matrix_1.2-14          vctrs_0.2.0            pillar_1.4.2           d3heatmap_0.6.1.2      bitops_1.0-6          
[103] httpuv_1.5.1           R6_2.4.0               latticeExtra_0.6-28    promises_1.0.1         codetools_0.2-15       MASS_7.3-50           
[109] assertthat_0.2.1       pkgmaker_0.27          Category_2.48.1        withr_2.1.2            GenomeInfoDbData_1.2.0 hms_0.5.0             
[115] grid_3.5.1             rpart_4.1-13           coda_0.19-3            rmarkdown_1.14         bbmle_1.0.20           numDeriv_2016.8-1.1   
[121] lubridate_1.7.4        shiny_1.3.2            base64enc_0.1-3

Answer 1 · 2019-08-09T12:29:21.000Z

I tried to replicate this with the airway demo dataset but could not encounter the error.

As you said, it can be something dataset-specific.

Since I do not do any thresholding (head and tail would handle it gently), I am thinking it can be because of non-matching names. Are you uploading a gene annotation table as well?

Answer 2 · 2019-08-09T15:11:37.000Z

Thanks for looking into it so quickly. It is definitely dataset specific. I don't supply an annotation to hi_loadings(), but I do use one to manually rename genes earlier on in my analysis, so that is indeed a plausible cause. After posting here, I ran into more errors in other parts of my analysis code with other datasets, and they turned out to be caused by IDs that didn't resolve during lookup and ended up with NAs. I've since implemented some checks and workarounds to prevent NAs from being propagating downstream, and can confirm this also resolved my issue with hi_loadings(). On 9 Aug 2019 14:29, Federico Marini <notifications@github.com> wrote: I tried to replicate this with the airway demo dataset but could not encounter the error. [image]<https://user-images.githubusercontent.com/7447478/62778833-ce062e00-bab1-11e9-8db2-bee6cf2469cf.png> As you said, it can be something dataset-specific. Since I do not do any thresholding (head and tail would handle it gently), I am thinking it can be because of non-matching names. Are you uploading a gene annotation table as well? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#7?email_source=notifications&email_token=AEKM6ORYKD2HVJ7KFR7PUBDQDVPKFA5CNFSM4IKRTWB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD36QWZI#issuecomment-519900005>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AEKM6OV4KQ7JLUOWLGCMFDLQDVPKFANCNFSM4IKRTWBQ>.

Answer 3 · 2019-08-09T21:15:29.000Z

Ok, your explanation makes sense.
It sounds like a case when an ID could not get mapped to GeneSymbol (say), and the name assigned to it is then a NA that R does not like when subsetting

Answer 4 · 2019-08-12T07:32:08.000Z

Closing this, in case you feel it needs to be reopened, let me know 😉