neurogenomics/MAGMA_Celltyping

WARNING: 228 columns (cell-types) have less than the expected number of quantile bins (10).

AMCalejandro opened this issue · 1 comments

1. Bug description

I am trying to understand why I cannot properly generate a quantile ( numberOfBins=4) from the Zeisel data accessible through EWCE pkg

To make sure I am understanding properly. Is it that there is not enough gene expression difference in the Zeisel dataset to generate
quantiles based on higher to lower % of genes expressed?

I would appreciate some help understanding what is going on in the Zeisel data.

Console output

Standardising CellTypeDataset
Found 3 matrix types across 5 CTD levels.
Processing level: 1
Converting to sparse matrix.
Processing level: 2
Converting to sparse matrix.
Processing level: 3
Converting to sparse matrix.
Processing level: 4
Converting to sparse matrix.
Processing level: 5
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
Converting to sparse matrix.
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
+ <2 non-zero quantile bins detected in column. Assigning these values to default quantile  ( 5 )
Converting to sparse matrix.
Checking CTD: level 1
WARNING: 4 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 2
WARNING: 3 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 3
WARNING: 6 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 4
WARNING: 26 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 5
WARNING: 205 columns (cell-types) have less than the expected number of quantile bins (4).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 1
WARNING: 1 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 2
WARNING: 1 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 3
WARNING: 6 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 4
WARNING: 30 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 5
WARNING: 228 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

2. Reproducible example

Code

ctd <- get_ctd("ctd_Zeisel2018")
ctd_quant <- MAGMA.Celltyping::prepare_quantile_groups(ctd = ctd,
                                                  standardise = TRUE,
                                                  non121_strategy = "dbs",
                                                  input_species = "mouse",
                                                  output_species = "human",
                                                  numberOfBins = 4)

3. Session info

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.14.2      MAGMA.Celltyping_2.0.7 forcats_0.5.2          stringr_1.4.1          dplyr_1.0.10           purrr_0.3.4           
 [7] readr_2.1.2            tidyr_1.2.0            tibble_3.1.8           ggplot2_3.3.6          tidyverse_1.3.2        here_1.0.1            

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                    R.utils_2.12.0                tidyselect_1.1.2              lme4_1.1-30                   RSQLite_2.2.16               
  [6] AnnotationDbi_1.59.1          htmlwidgets_1.5.4             grid_4.2.0                    BiocParallel_1.31.12          munsell_0.5.0                
 [11] codetools_0.2-18              withr_2.5.0                   colorspace_2.0-3              Biobase_2.57.1                filelock_1.0.2               
 [16] knitr_1.40                    rstudioapi_0.14               orthogene_1.3.2               stats4_4.2.0                  SingleCellExperiment_1.19.0  
 [21] ggsignif_0.6.3                gitcreds_0.1.1                labeling_0.4.2                MatrixGenerics_1.9.1          GenomeInfoDbData_1.2.8       
 [26] farver_2.1.1                  bit64_4.0.5                   rprojroot_2.0.3               vctrs_0.4.1                   treeio_1.21.2                
 [31] generics_0.1.3                xfun_0.32                     BiocFileCache_2.5.0           R6_2.5.1                      GenomeInfoDb_1.33.5          
 [36] bitops_1.0-7                  cachem_1.0.6                  gridGraphics_0.5-1            DelayedArray_0.23.1           assertthat_0.2.1             
 [41] promises_1.2.0.1              BiocIO_1.7.1                  scales_1.2.1                  googlesheets4_1.0.1           gtable_0.3.1                 
 [46] rlang_1.0.5                   MungeSumstats_1.5.13          splines_4.2.0                 rtracklayer_1.57.0            rstatix_0.7.0                
 [51] lazyeval_0.2.2                gargle_1.2.0                  broom_1.0.1                   BiocManager_1.30.18           yaml_2.3.5                   
 [56] reshape2_1.4.4                abind_1.4-5                   modelr_0.1.9                  GenomicFeatures_1.49.6        backports_1.4.1              
 [61] httpuv_1.6.5                  tools_4.2.0                   ggplotify_0.1.0               ellipsis_0.3.2                ggdendro_0.1.23              
 [66] BiocGenerics_0.43.1           Rcpp_1.0.9                    plyr_1.8.7                    progress_1.2.2                zlibbioc_1.43.0              
 [71] RCurl_1.98-1.8                prettyunits_1.1.1             ggpubr_0.4.0                  S4Vectors_0.35.3              SummarizedExperiment_1.27.2  
 [76] haven_2.5.1                   fs_1.5.2                      magrittr_2.0.3                gh_1.3.0                      reprex_2.0.2                 
 [81] googledrive_2.0.0             matrixStats_0.62.0            hms_1.1.2                     patchwork_1.1.2               mime_0.12                    
 [86] xtable_1.8-4                  XML_3.99-0.10                 EWCE_1.5.7                    readxl_1.4.1                  IRanges_2.31.2               
 [91] gridExtra_2.3                 compiler_4.2.0                biomaRt_2.53.2                crayon_1.5.1                  minqa_1.2.4                  
 [96] R.oo_1.25.0                   htmltools_0.5.3               ggfun_0.0.7                   later_1.3.0                   tzdb_0.3.0                   
[101] aplot_0.1.6                   lubridate_1.8.0               DBI_1.1.3                     ExperimentHub_2.5.0           gprofiler2_0.2.1             
[106] dbplyr_2.2.1                  MASS_7.3-58.1                 rappdirs_0.3.3                boot_1.3-28                   babelgene_22.3               
[111] Matrix_1.4-1                  car_3.1-0                     piggyback_0.1.3               cli_3.3.0                     R.methodsS3_1.8.2            
[116] parallel_4.2.0                GenomicRanges_1.49.1          pkgconfig_2.0.3               GenomicAlignments_1.33.1      plotly_4.10.0                
[121] xml2_1.3.3                    ggtree_3.5.3                  XVector_0.37.1                rvest_1.0.3                   yulab.utils_0.0.5            
[126] VariantAnnotation_1.43.3      digest_0.6.29                 Biostrings_2.65.3             cellranger_1.1.0              HGNChelper_0.8.1             
[131] tidytree_0.4.0                restfulr_0.0.15               curl_4.3.2                    shiny_1.7.2                   Rsamtools_2.13.4             
[136] rjson_0.2.21                  nloptr_2.0.3                  lifecycle_1.0.1               nlme_3.1-159                  jsonlite_1.8.0               
[141] carData_3.0-5                 viridisLite_0.4.1             limma_3.53.6                  BSgenome_1.65.2               fansi_1.0.3                  
[146] pillar_1.8.1                  lattice_0.20-45               homologene_1.4.68.19.3.27     KEGGREST_1.37.3               fastmap_1.1.0                
[151] httr_1.4.4                    googleAuthR_2.0.0             interactiveDisplayBase_1.35.0 glue_1.6.2                    RNOmni_1.0.1                 
[156] png_0.1-7                     ewceData_1.5.0                BiocVersion_3.16.0            bit_4.0.4                     stringi_1.7.8                
[161] blob_1.2.3                    AnnotationHub_3.5.0           memoise_2.0.1                 ape_5.6-2  

This is actually the same issue described here:
#130

The CTD is already converted to human orthologs, so set the input species to "human. or leave the default, and it will automatically infer the correct species.