federicomarini/pcaExplorer

hi_loadings() out-of-bounds error

fruce-ki opened this issue · 4 comments

I get the following when trying to extract the top N genes from a primary components object prepared with pcomp():

Error in exprTable[names(geneloadings_extreme), ] : 
   subscript out of bounds

The error happens only for a subset of the primary components and can be resolved by lowering the number N passed to the topN parameter (I'm asking for just 25 genes out of 500 used in the PCA, but is some case that's already too many apparently).

I suspect that hi_loadings() does some internal thresholding, and that in some cases where a component's contribution is low, the genes that pass the criteria are fewer than topN, resulting in the error. If that is the case, a more useful behaviour would be to return the genes that do pass the criteria, with a warning that fewer than N genes were returned.

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] htmltools_0.3.6             ggdendro_0.1-20             plotly_4.9.0                ggrepel_0.8.1               gridExtra_2.3              
 [6] forcats_0.4.0               stringr_1.4.0               dplyr_0.8.3                 purrr_0.3.2                 readr_1.3.1                
[11] tidyr_0.8.3                 tibble_2.1.3                ggplot2_3.2.0               tidyverse_1.2.1             pcaExplorer_2.8.1          
[16] topGO_2.34.0                SparseM_1.77                GO.db_3.7.0                 AnnotationDbi_1.44.0        graph_1.60.0               
[21] DESeq2_1.22.2               SummarizedExperiment_1.12.0 DelayedArray_0.8.0          BiocParallel_1.16.6         Biobase_2.42.0             
[26] GenomicRanges_1.34.0        GenomeInfoDb_1.18.2         IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0        
[31] matrixStats_0.54.0          data.table_1.12.2          

loaded via a namespace (and not attached):
  [1] readxl_1.3.1           backports_1.1.4        GOstats_2.48.0         Hmisc_4.2-0            NMF_0.21.0             plyr_1.8.4            
  [7] igraph_1.2.4.1         lazyeval_0.2.2         GSEABase_1.44.0        shinydashboard_0.7.1   splines_3.5.1          crosstalk_1.0.0       
 [13] gridBase_0.4-7         digest_0.6.20          foreach_1.4.7          magrittr_1.5           checkmate_1.9.4        memoise_1.1.0         
 [19] cluster_2.0.7-1        doParallel_1.0.14      limma_3.38.3           annotate_1.60.1        modelr_0.1.4           prettyunits_1.0.2     
 [25] colorspace_1.4-1       apeglm_1.4.2           rvest_0.3.4            blob_1.2.0             haven_2.1.1            xfun_0.8              
 [31] jsonlite_1.6           crayon_1.3.4           RCurl_1.95-4.12        genefilter_1.64.0      zeallot_0.1.0          survival_2.42-3       
 [37] iterators_1.0.12       glue_1.3.1             registry_0.5-1         gtable_0.3.0           zlibbioc_1.28.0        XVector_0.22.0        
 [43] Rgraphviz_2.26.0       scales_1.0.0           pheatmap_1.0.12        DBI_1.0.0              rngtools_1.4           bibtex_0.4.2          
 [49] Rcpp_1.0.2             emdbook_1.3.11         viridisLite_0.3.0      xtable_1.8-4           progress_1.2.2         htmlTable_1.13.1      
 [55] foreign_0.8-70         bit_1.1-14             Formula_1.2-3          DT_0.7                 AnnotationForge_1.24.0 htmlwidgets_1.3       
 [61] httr_1.4.0             threejs_0.3.1          RColorBrewer_1.1-2     shinyAce_0.4.0         acepack_1.4.1          pkgconfig_2.0.2       
 [67] XML_3.98-1.20          nnet_7.3-12            locfit_1.5-9.1         labeling_0.3           tidyselect_0.2.5       rlang_0.4.0           
 [73] reshape2_1.4.3         later_0.8.0            cellranger_1.1.0       munsell_0.5.0          tools_3.5.1            cli_1.1.0             
 [79] generics_0.0.2         RSQLite_2.1.2          broom_0.5.2            evaluate_0.14          shinyBS_0.61           yaml_2.2.0            
 [85] knitr_1.23             bit64_0.9-7            RBGL_1.58.2            nlme_3.1-137           mime_0.7               xml2_1.2.0            
 [91] biomaRt_2.38.0         compiler_3.5.1         rstudioapi_0.10        png_0.1-7              geneplotter_1.60.0     stringi_1.4.3         
 [97] lattice_0.20-35        Matrix_1.2-14          vctrs_0.2.0            pillar_1.4.2           d3heatmap_0.6.1.2      bitops_1.0-6          
[103] httpuv_1.5.1           R6_2.4.0               latticeExtra_0.6-28    promises_1.0.1         codetools_0.2-15       MASS_7.3-50           
[109] assertthat_0.2.1       pkgmaker_0.27          Category_2.48.1        withr_2.1.2            GenomeInfoDbData_1.2.0 hms_0.5.0             
[115] grid_3.5.1             rpart_4.1-13           coda_0.19-3            rmarkdown_1.14         bbmle_1.0.20           numDeriv_2016.8-1.1   
[121] lubridate_1.7.4        shiny_1.3.2            base64enc_0.1-3       

I tried to replicate this with the airway demo dataset but could not encounter the error.
image

As you said, it can be something dataset-specific.

Since I do not do any thresholding (head and tail would handle it gently), I am thinking it can be because of non-matching names. Are you uploading a gene annotation table as well?

Ok, your explanation makes sense.
It sounds like a case when an ID could not get mapped to GeneSymbol (say), and the name assigned to it is then a NA that R does not like when subsetting

Closing this, in case you feel it needs to be reopened, let me know 😉