neurogenomics/MAGMA_Celltyping

Error running after using new version of .rds files

Captain-Pam opened this issue · 4 comments

1. Bug description

I noticed the upgrade of the R package and downloaded the rds file, which has a change in file type (dgCMatrix) compared to the rda file.

(A clear and concise description of what the bug is.)
Running celltype_associations_pipeline gives the error "Error in apply(ctd_1lvl[[.matrix_name]], 2, function(x) { :
dim(X) must have a positive length". The problem may lie in the check_quantiles.R function

Console output

# Paste console output here (e.g. from R/python/command line)
Standardising CellTypeDataset.
Converting to sparse matrix.
Checking CTD: level 1
Error in apply(ctd_1lvl[[.matrix_name]], 2, function(x) { : 
  dim(X) must have a positive length

Expected behaviour

(A clear and concise description of what you expected to happen.)

2. Reproducible example

Code

(Please add the steps to reproduce the bug here. See here for an intro to making a reproducible example (i.e. reprex) and why they're important! This will help us to help you much faster.)

# Paste example here
storage_dir <- ./MAGMA.Celltyping"
#example
#ref genome
genome_ref_path <- "./EUR/g1000_eur"
#MAGMA files
magma_dirs <- "./sampledata/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN"
#cell type data path
ctd_path <- "./Storage_data"
#Specific cell type, if not in dir, it will download from website
ctd <- MAGMA.Celltyping::get_ctd("ctd_AIBS",storage_dir =ctd_path)
#cell type associations
MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
  magma_dirs = magma_dirs,
  ctd = ctd,
  ctd_species = "mouse", 
  ctd_name = "ctd_DRONC_mouse_test", 
  run_linear = TRUE, 
  run_top10 = TRUE,
  run_conditional=TRUE,
  save_dir =storage_dir)

Data

(If possible, upload a small sample of your data so that we can reproduce the bug on our end. If that's not possible, please at least include a screenshot of your data and other relevant details.)

3. Session info

(Add output of the R function utils::sessionInfo() below. This helps us assess version/OS conflicts which could be causing bugs.)

# Paste utils::sessionInfo() output 
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /app/python/python3.9/envs/R4.1.3/lib/libopenblasp-r0.3.20.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.9            ggplot2_3.3.6          MAGMA.Celltyping_2.0.4

loaded via a namespace (and not attached):
  [1] backports_1.4.1               AnnotationHub_3.2.2          
  [3] BiocFileCache_2.2.1           plyr_1.8.7                   
  [5] googleAuthR_2.0.0             lazyeval_0.2.2               
  [7] splines_4.1.3                 orthogene_1.0.2              
  [9] ewceData_1.2.0                BiocParallel_1.28.3          
 [11] GenomeInfoDb_1.30.1           digest_0.6.29                
 [13] htmltools_0.5.2               RNOmni_1.0.0                 
 [15] fansi_1.0.3                   magrittr_2.0.3               
 [17] memoise_2.0.1                 BSgenome_1.62.0              
 [19] limma_3.50.3                  Biostrings_2.62.0            
 [21] matrixStats_0.62.0            R.utils_2.11.0               
 [23] prettyunits_1.1.1             colorspace_2.0-3             
 [25] blob_1.2.3                    rappdirs_0.3.3               
 [27] gitcreds_0.1.1                crayon_1.5.1                 
 [29] RCurl_1.98-1.6                jsonlite_1.8.0               
 [31] lme4_1.1-29                   VariantAnnotation_1.40.0     
 [33] glue_1.6.2                    gtable_0.3.0                 
 [35] gargle_1.2.0                  zlibbioc_1.40.0              
 [37] XVector_0.34.0                HGNChelper_0.8.1             
 [39] DelayedArray_0.20.0           car_3.0-13                   
 [41] SingleCellExperiment_1.16.0   BiocGenerics_0.40.0          
 [43] abind_1.4-5                   scales_1.2.0                 
 [45] DBI_1.1.2                     rstatix_0.7.0                
 [47] Rcpp_1.0.8.3                  viridisLite_0.4.0            
 [49] xtable_1.8-4                  progress_1.2.2               
 [51] bit_4.0.4                     stats4_4.1.3                 
 [53] htmlwidgets_1.5.4             httr_1.4.3                   
 [55] ellipsis_0.3.2                pkgconfig_2.0.3              
 [57] XML_3.99-0.9                  R.methodsS3_1.8.1            
 [59] dbplyr_2.1.1                  utf8_1.2.2                   
 [61] tidyselect_1.1.2              rlang_1.0.2                  
 [63] reshape2_1.4.4                later_1.3.0                  
 [65] AnnotationDbi_1.56.2          munsell_0.5.0                
 [67] BiocVersion_3.14.0            tools_4.1.3                  
 [69] cachem_1.0.6                  cli_3.3.0                    
 [71] generics_0.1.2                RSQLite_2.2.14               
 [73] ExperimentHub_2.2.1           MungeSumstats_1.4.0          
 [75] broom_0.8.0                   stringr_1.4.0                
 [77] fastmap_1.1.0                 ggdendro_0.1.23              
 [79] yaml_2.3.5                    babelgene_22.3               
 [81] bit64_4.0.5                   fs_1.5.2                     
 [83] purrr_0.3.4                   gh_1.3.0                     
 [85] KEGGREST_1.34.0               gprofiler2_0.2.1             
 [87] nlme_3.1-157                  mime_0.12                    
 [89] R.oo_1.24.0                   xml2_1.3.3                   
 [91] biomaRt_2.50.3                compiler_4.1.3               
 [93] plotly_4.10.0                 filelock_1.0.2               
 [95] curl_4.3.2                    png_0.1-7                    
 [97] interactiveDisplayBase_1.32.0 ggsignif_0.6.3               
 [99] tibble_3.1.7                  EWCE_1.5.1                   
[101] homologene_1.4.68.19.3.27     stringi_1.7.6                
[103] GenomicFeatures_1.46.5        lattice_0.20-45              
[105] Matrix_1.4-1                  nloptr_2.0.1                 
[107] vctrs_0.4.1                   pillar_1.7.0                 
[109] lifecycle_1.0.1               BiocManager_1.30.17          
[111] data.table_1.14.2             bitops_1.0-7                 
[113] httpuv_1.6.5                  patchwork_1.1.1              
[115] rtracklayer_1.54.0            GenomicRanges_1.46.1         
[117] R6_2.5.1                      BiocIO_1.4.0                 
[119] promises_1.2.0.1              gridExtra_2.3                
[121] IRanges_2.28.0                boot_1.3-28                  
[123] MASS_7.3-57                   assertthat_0.2.1             
[125] SummarizedExperiment_1.24.0   rjson_0.2.21                 
[127] withr_2.5.0                   GenomicAlignments_1.30.0     
[129] Rsamtools_2.10.0              S4Vectors_0.32.4             
[131] GenomeInfoDbData_1.2.7        parallel_4.1.3               
[133] hms_1.1.1                     grid_4.1.3                   
[135] tidyr_1.2.0                   minqa_1.2.4                  
[137] MatrixGenerics_1.6.0          carData_3.0-5                
[139] ggpubr_0.4.0                  piggyback_0.1.3              
[141] lubridate_1.8.0               Biobase_2.54.0               
[143] shiny_1.7.1                   restfulr_0.0.13 

Hi @Captain-Pam, thanks for flagging this. Working on a fix now.

Ok, so it seems there's a couple of things happening:

1. CTDs missing "specificity_quantiles"

I believe this is a bug I recently introduced into EWCE::standardise_ctd. I'll fix this and make a push.

In the meantime, I'll add an extra safety feature to MAGMA.Celltyping that ensures the "specificity_quantiles" matrices are computed when they're missing the from the CTD.

Could you try updating to MAGMA.Celltyping v2.0.6 and try again?

2. Incongruent ctd_species

It looks like you're specifying ctd_species = "mouse" when in fact the genes are already in human format (for "ctd_AIBS"). Since you have standardise=TRUE (the default), the CTD gets passed to EWCE::standardise_ctd and tries to convert "mouse" genes to human genes. Thus, there's no genes left in the matrix and you get an error.

Specifying the correct species seems to work:

ctd <- MAGMA.Celltyping::get_ctd("ctd_AIBS")
ctd <- EWCE::standardise_ctd(ctd, input_species = "human", force_standardise = TRUE)

And using the full example:

ctd <- MAGMA.Celltyping::get_ctd("ctd_AIBS")
magma_dirs <- MAGMA.Celltyping::import_magma_files()

MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
  magma_dirs = magma_dirs,
  ctd = ctd,
  ctd_species = "human", 
  ctd_name = "ctd_AIBS_test", 
  run_linear = TRUE, 
  run_top10 = TRUE,
  run_conditional=TRUE)

In the latest CTD files supplied in get_ctd, I've already converted them all into 1:1 human orthologs so the user doesn't have to do that extra step (though note that ewceData::ctd() still provides the CTD as mouse genes).

That said, if you're ever unsure which species a CTD is in, you can use this nifty function i added to orthogene, which guesses the correct species pretty consistently.

res <- orthogene::infer_species(gene_df = ctd[[1]]$mean_exp)

See #117 for info on how I implemented this within MAGMA.Celltyping.

1)I found that some datasets (ctd) may not download and where can I download the data that has been collated?

Which ctd were you looking for specifically? It looks like I did indeed miss uploading several, so I just added these now and they should be available via get_ctd. Only one I'm still working on is "ctd_TabulaMurisSenis", which I've added a note to in the docs and will try to upload soon.

  1. Is MAGMA.Celltyping the same analysis method as MAGMA Gene Property Analysis on FUMA platform?

Actually no, they are totally independent pipelines that just happen to use a lot of the same underlying software (eg MAGMA).

  1. Why transfer cell type data from mice to humans? Is it because mice and humans have different genetic nomenclature?

Currently all comprehensive single-cell atlases for human are single-nucleus (not single-cell), meaning a lot of the RNA transcripts are lost, making the data lower quality and more difficult to distinguish cell subtypes from. Mouse atlases like Zeisel2018 or TabulaMuris, by contrast, are single-cell and therefore higher quality (despite being different species). There's definitely tradeoffs, but you can read more in the original paper:
https://www.nature.com/articles/s41588-018-0129-5

1:1 gene orthologs are converted between species using the package orthogene (also developed by our lab), which leaves ~16k comparable genes between mouse and human.