neurogenomics/MAGMA_Celltyping

Problem with zeisel 2018 ctd

KittyMurphy opened this issue · 1 comments

celltype_associations_pipeline() runs into errors when using the Zeisel 2018 ctd loaded using get_ctd(). I have run this with the Zeisel 2015 ctd with no issues.

ctd <- get_ctd("ctd_Zeisel2018")

MAGMA_results <- MAGMA.Celltyping::celltype_associations_pipeline(
magma_dirs = magma_dirs,
ctd = ctd,
ctd_species = "mouse",
ctd_name = "Zeisel2018",
run_linear = TRUE,
run_top10 = TRUE)

Standardising CellTypeDataset.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.

  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
    Converting to sparse matrix.
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
  • <2 non-zero quantile bins detected in column. Assigning these values to default quantile ( 5 )
    Converting to sparse matrix.
    Checking CTD: level 1
    WARNING: 4 columns (cell-types) have less than the expected number of quantile bins (40).
    This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 2
WARNING: 6 columns (cell-types) have less than the expected number of quantile bins (40).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 3
WARNING: 16 columns (cell-types) have less than the expected number of quantile bins (40).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 4
WARNING: 37 columns (cell-types) have less than the expected number of quantile bins (40).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 5
WARNING: 231 columns (cell-types) have less than the expected number of quantile bins (40).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 1
WARNING: 1 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 2
WARNING: 1 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 3
WARNING: 6 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 4
WARNING: 30 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

Checking CTD: level 5
WARNING: 228 columns (cell-types) have less than the expected number of quantile bins (10).
This may be due to an excessive sparsity or insufficient variation in your CellTypeDataset.

PD_23andME_dbSNP155_no_non_bi.tsv.gz.35UP.10DOWN
======= Calculating celltype associations: linear mode =======
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Importing precomputed MAGMA results.
4 celltypes in ctd but 7 in results file.
Some cell-types may have been dropped due to
'variance is too low (set contains only one
gene used in analysis)'.
<50% of celltypes missing. Attemping to fix by removing missing cell-types:

  • Glia
  • Immune_cells
  • Neurons
  • Vascular_cells
    ======= Calculating celltype associations: top10% mode =======
    Installed MAGMA version: v1.10
    Skipping MAGMA installation.
    The desired_version of MAGMA is currently installed: v1.10
    Using: magma_v1.10_mac
    Importing precomputed MAGMA results.
    4 celltypes in ctd but 7 in results file.
    Some cell-types may have been dropped due to
    'variance is too low (set contains only one
    gene used in analysis)'.
    <50% of celltypes missing. Attemping to fix by removing missing cell-types:
  • Glia
  • Immune_cells
  • Neurons
  • Vascular_cells
    Saving results ==> /var/folders/bg/hds6dv8d4390lj02yr82cf_w0000gn/T//RtmpC2zZ7b/Zeisel2018/MAGMA_celltyping.Zeisel2018.rds

Looking back at this, I think was because you were specifying ctd_species = "mouse", but all of the CTDs provided via get_ctd are already converted to human orthologs. So it makes sense that you wouldn't have enough genes (and thus quantiles) if you tried converting orthologs twice.

The default behaviour is to automatically infer the CTD species now, so you dont need to specify this.

This works:

magma_dirs <- MAGMA.Celltyping::import_magma_files(ids = c("ieu-a-298"))
ctd <- get_ctd("ctd_Zeisel2018")

level_1res <- MAGMA.Celltyping::celltype_associations_pipeline(
    ctd = ctd,
    ctd_levels = 1,
    ctd_name = "Zeisel2018", 
    magma_dirs = magma_dirs
)
Standardising CellTypeDataset.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Converting to sparse matrix.
Checking CTD: level 1
Checking CTD: level 2
Checking CTD: level 3
Checking CTD: level 4
Checking CTD: level 5
Checking CTD: level 1
Checking CTD: level 2
Checking CTD: level 3
Checking CTD: level 4
Checking CTD: level 5
ieu-a-298.tsv.gz.35UP.10DOWN
======= Calculating celltype associations: linear mode =======
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Running MAGMA: Linear mode
Mapping gene symbols in specificity_quantiles matrix to entrez IDs.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN/ieu-a-298.tsv.gz.35UP.10DOWN.genes.raw
	--gene-covar /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/file1353877f72f2e
	--model direction=pos
	--out /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN/ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2018_linear.Linear

Start time is 20:57:55, Wednesday 12 Apr 2023

Reading file /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN/ieu-a-298.tsv.gz.35UP.10DOWN.genes.raw... 
	1355 genes read from file
Loading gene-level covariates...
Reading file /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/file1353877f72f2e... 
	detected 4 variables in file (using all)
	found 4 valid gene covariates, for 1126 genes defined in genotype data
Processing missing values...
	found 229 genes not present in all input files: removing these from analysis
	1126 genes remaining in analysis
Preparing variables for analysis...
	truncating Z-scores 3 points below zero or 6 standard deviations above the mean
	truncating covariate values more than 5 standard deviations from the mean
	total variables available for analysis: 4 gene covariates

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
	testing direction: one-sided, positive (sets), one-sided, positive (covar)
	conditioning on internal variables:
		gene size, log(gene size)
		gene density, log(gene density)
		inverse mac, log(inverse mac)
	analysing individual variables

	analysing single-variable models (number of models: 4)
	writing results to file /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN/ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2018_linear.Linear.gsa.out

End time is 20:57:55, Wednesday 12 Apr 2023 (elapsed: 00:00:00)
Reading enrichment results file into R.
======= Calculating celltype associations: top10% mode =======
Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Running MAGMA: Top 10% mode
Mapping gene symbols in specificity_deciles matrix to entrez IDs.
Constructing top10% gene marker sets for 5 cell-types.
Welcome to MAGMA v1.10 (custom)
Using flags:
	--gene-results /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN/ieu-a-298.tsv.gz.35UP.10DOWN.genes.raw
	--set-annot /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/file1353853ab28a
	--out /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN/ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2018_top10.Top10pct

Start time is 20:57:55, Wednesday 12 Apr 2023

Reading file /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN/ieu-a-298.tsv.gz.35UP.10DOWN.genes.raw... 
	1355 genes read from file
Loading gene-set annotation...
Reading file /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/file1353853ab28a... 
	4 gene-set definitions read from file
	found 4 gene sets containing genes defined in genotype data (containing a total of 444 unique genes)
Preparing variables for analysis...
	truncating Z-scores 3 points below zero or 6 standard deviations above the mean
	truncating covariate values more than 5 standard deviations from the mean
	total variables available for analysis: 4 gene sets

Parsing model specifications...
Inverting gene-gene correlation matrix...
Performing regression analysis...                                                                                  
	testing direction: one-sided, positive (sets), two-sided (covar)
	conditioning on internal variables:
		gene size, log(gene size)
		gene density, log(gene density)
		inverse mac, log(inverse mac)
	analysing individual variables

	analysing single-variable models (number of models: 4)
	writing results to file /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/MAGMA_Files/ieu-a-298.tsv.gz.35UP.10DOWN/ieu-a-298.tsv.gz.35UP.10DOWN.level1.Zeisel2018_top10.Top10pct.gsa.out

End time is 20:57:55, Wednesday 12 Apr 2023 (elapsed: 00:00:00)
Reading enrichment results file into R.
Merging linear and top10% results
Saving results ==> /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpNdZvD2/Zeisel2018/MAGMA_celltyping.Zeisel2018.rds

Session info

``` R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Ventura 13.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] MAGMA.Celltyping_2.0.9

loaded via a namespace (and not attached):
[1] utf8_1.2.3 R.utils_2.12.2 tidyselect_1.2.0
[4] lme4_1.1-32 RSQLite_2.3.1 AnnotationDbi_1.60.2
[7] htmlwidgets_1.6.2 grid_4.2.1 BiocParallel_1.32.6
[10] devtools_2.4.5 munsell_0.5.0 codetools_0.2-19
[13] DT_0.27 miniUI_0.1.1.1 withr_2.5.0
[16] colorspace_2.1-0 Biobase_2.58.0 filelock_1.0.2
[19] knitr_1.42 rstudioapi_0.14 orthogene_1.5.3
[22] stats4_4.2.1 SingleCellExperiment_1.20.1 ggsignif_0.6.4
[25] gitcreds_0.1.2 MatrixGenerics_1.10.0 httr2_0.2.2
[28] GenomeInfoDbData_1.2.9 bit64_4.0.5 rprojroot_2.0.3
[31] vctrs_0.6.1 treeio_1.23.1 generics_0.1.3
[34] xfun_0.38 timechange_0.2.0 BiocFileCache_2.6.1
[37] R6_2.5.1 GenomeInfoDb_1.34.9 bitops_1.0-7
[40] cachem_1.0.7 gridGraphics_0.5-1 DelayedArray_0.24.0
[43] assertthat_0.2.1 promises_1.2.0.1 BiocIO_1.8.0
[46] scales_1.2.1 gtable_0.3.3 processx_3.8.0
[49] rlang_1.1.0 MungeSumstats_1.6.0 splines_4.2.1
[52] rtracklayer_1.58.0 rstatix_0.7.2 lazyeval_0.2.2
[55] gargle_1.3.0 broom_1.0.4 BiocManager_1.30.20
[58] yaml_2.3.7 reshape2_1.4.4 abind_1.4-5
[61] GenomicFeatures_1.50.4 backports_1.4.1 rsconnect_0.8.29
[64] httpuv_1.6.9 tools_4.2.1 usethis_2.1.6
[67] ggplotify_0.1.0 ggplot2_3.4.2 ellipsis_0.3.2
[70] ggdendro_0.1.23 BiocGenerics_0.44.0 sessioninfo_1.2.2
[73] Rcpp_1.0.10 plyr_1.8.8 progress_1.2.2
[76] zlibbioc_1.44.0 purrr_1.0.1 RCurl_1.98-1.12
[79] ps_1.7.4 prettyunits_1.1.1 ggpubr_0.6.0
[82] urlchecker_1.0.1 S4Vectors_0.36.2 SummarizedExperiment_1.28.0
[85] grr_0.9.5 fs_1.6.1 magrittr_2.0.3
[88] data.table_1.14.8 gh_1.4.0 matrixStats_0.63.0
[91] pkgload_1.3.2 evaluate_0.20 hms_1.1.3
[94] patchwork_1.1.2 mime_0.12 xtable_1.8-4
[97] XML_3.99-0.14 EWCE_1.7.4 IRanges_2.32.0
[100] testthat_3.1.7 compiler_4.2.1 biomaRt_2.54.1
[103] tibble_3.2.1 crayon_1.5.2 minqa_1.2.5
[106] R.oo_1.25.0 htmltools_0.5.5 ggfun_0.0.9
[109] later_1.3.0 tidyr_1.3.0 aplot_0.1.10
[112] lubridate_1.9.2 DBI_1.1.3 ExperimentHub_2.6.0
[115] gprofiler2_0.2.1 dbplyr_2.3.2 MASS_7.3-58.3
[118] rappdirs_0.3.3 boot_1.3-28.1 babelgene_22.9
[121] Matrix_1.5-4 car_3.1-2 brio_1.1.3
[124] piggyback_0.1.4 cli_3.6.1 R.methodsS3_1.8.2
[127] parallel_4.2.1 GenomicRanges_1.50.2 pkgconfig_2.0.3
[130] GenomicAlignments_1.34.1 plotly_4.10.1 xml2_1.3.3
[133] roxygen2_7.2.3 ggtree_3.6.2 XVector_0.38.0
[136] yulab.utils_0.0.6 stringr_1.5.0 VariantAnnotation_1.44.1
[139] callr_3.7.3 digest_0.6.31 Biostrings_2.66.0
[142] rmarkdown_2.21 HGNChelper_0.8.1 tidytree_0.4.2
[145] restfulr_0.0.15 curl_5.0.0 shiny_1.7.4
[148] Rsamtools_2.14.0 rjson_0.2.21 nloptr_2.0.3
[151] lifecycle_1.0.3 nlme_3.1-162 jsonlite_1.8.4
[154] carData_3.0-5 desc_1.4.2 viridisLite_0.4.1
[157] limma_3.54.2 BSgenome_1.66.3 fansi_1.0.4
[160] pillar_1.9.0 lattice_0.21-8 homologene_1.4.68.19.3.27
[163] KEGGREST_1.38.0 fastmap_1.1.1 httr_1.4.5
[166] pkgbuild_1.4.0 googleAuthR_2.0.0 interactiveDisplayBase_1.36.0
[169] glue_1.6.2 remotes_2.4.2 RNOmni_1.0.1
[172] png_0.1-8 ewceData_1.7.1 BiocVersion_3.16.0
[175] bit_4.0.5 stringi_1.7.1

</details>