neurogenomics/MAGMA_Celltyping

`map_snps_to_genes`: Handle bgzipped files

Closed this issue · 1 comments

Currently throws an errors when GWAS sumstats are bgzip-compressed (e.g. tabix-indexed files).

eduAttainOkbayPth <- system.file("extdata", "eduAttainOkbay.txt",
                                 package = "MungeSumstats"
)
reformatted <- format_sumstats(
    path = eduAttainOkbayPth,
    ref_genome = "GRCh37",
    dbSNP = 144,bi_allelic_filter = TRUE,
    tabix_index = TRUE,
    log_folder_ind = TRUE,
    log_mungesumstats_msgs = TRUE,
)
magma_files <-  MAGMA.Celltyping::map_snps_to_genes(
    path_formatted = reformatted$sumstats,
    genome_build = "GRCH37",  
    population = "EUR",
    upstream_kb = 35,  
    downstream_kb = 10, 
    force_new = FALSE
)
******::NOTE::******
 - Formatted results will be saved to `tempdir()` by default.
 - This means all formatted summary stats will be deleted upon ending the R session.
 - To keep formatted summary stats, change `save_path`  ( e.g. `save_path=file.path('./formatted',basename(path))` ),   or make sure to copy files elsewhere after processing  ( e.g. `file.copy(save_path, './formatted/' )`.
 ******************** 

******::NOTE::******
 - Log results will be saved to `tempdir()` by default.
 - This means all log data from the run will be  deleted upon ending the R session.
 - To keep it, change `log_folder` to an actual directory  (e.g. log_folder='./').
 ******************** 

save_path suggests .gz output but tabix_index=TRUE Switching output to tabix-indexed format (.bgz).
Formatted summary statistics will be saved to ==>  /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw/file826265f6aeca.tsv.bgz
Log data to be saved to ==>  /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw
Saving output messages to:
/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw/MungeSumstats_log_msg.txt
Any runtime errors will be saved to:
/var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw/MungeSumstats_log_output.txt
Messages will not be printed to terminal.
Returning path to saved data.
Warning messages:
1: package ‘S4Vectors’ was built under R version 4.2.2 
2: package ‘GenomeInfoDb’ was built under R version 4.2.2 



===================== 🦠🌋🦠 Welcome to MAGMA.Celltyping 🦠🌋🦠 =====================
This package uses MAGMA:
https://ctg.cncr.nl/software/magma

To cite MAGMA.Celltyping, please use:
* Skene, N.G., Bryois, J., Bakken, T.E. et al. Genetic identification of
     brain cell types underlying schizophrenia. Nat Genet 50, 825-833 (2018).
     https://doi.org/10.1038/s41588-018-0129-5
* de Leeuw CA, Mooij JM, Heskes T, Posthuma D (2015) MAGMA: Generalized
     Gene-Set Analysis of GWAS Data. PLOS Computational Biology 11(4): e1004219.
     https://doi.org/10.1371/journal.pcbi.1004219

Please report any bugs or feature requests by filling out an Issues template:
     https://github.com/neurogenomics/MAGMA_Celltyping/issues
===================== 🦠🌋🦠 =========================== 🦠🌋🦠 =====================

Installed MAGMA version: v1.10
Skipping MAGMA installation.
The desired_version of MAGMA is currently installed: v1.10
Using: magma_v1.10_mac
Using existing genome_ref found in storage_dir.
Saving decompressed copy of path_formatted ==>  /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmpqUHJNw/file826265f6aeca.tsv
Error in strsplit(first_line, "\t")[[1]] : subscript out of bounds

session info

``` R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Ventura 13.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.2
[4] BiocGenerics_0.44.0 MungeSumstats_1.6.0 phenomix_0.99.4

loaded via a namespace (and not attached):
[1] rappdirs_0.3.3
[2] rtracklayer_1.58.0
[3] scattermore_1.1
[4] R.methodsS3_1.8.2
[5] SeuratObject_4.1.3
[6] tidyr_1.3.0
[7] ggplot2_3.4.2
[8] clusterGeneration_1.3.7
[9] bit64_4.0.5
[10] irlba_2.3.5.1
[11] DelayedArray_0.24.0
[12] R.utils_2.12.2
[13] data.table_1.14.8
[14] KEGGREST_1.38.0
[15] RCurl_1.98-1.12
[16] doParallel_1.0.17
[17] generics_0.1.3
[18] GenomicFeatures_1.50.4
[19] RhpcBLASctl_0.23-42
[20] cowplot_1.1.1
[21] RSQLite_2.3.1
[22] RANN_2.6.1
[23] future_1.32.0
[24] bit_4.0.5
[25] spatstat.data_3.0-1
[26] webshot_0.5.4
[27] xml2_1.3.4
[28] httpuv_1.6.11
[29] SummarizedExperiment_1.28.0
[30] assertthat_0.2.1
[31] orthogene_1.5.3
[32] viridis_0.6.3
[33] gargle_1.4.0
[34] hms_1.1.3
[35] babelgene_22.9
[36] promises_1.2.0.1
[37] TSP_1.2-4
[38] fansi_1.0.4
[39] restfulr_0.0.15
[40] progress_1.2.2
[41] caTools_1.18.2
[42] dendextend_1.17.1
[43] dbplyr_2.3.2
[44] igraph_1.4.3
[45] DBI_1.1.3
[46] htmlwidgets_1.6.2
[47] sparsesvd_0.2-2
[48] spatstat.geom_3.2-1
[49] purrr_1.0.1
[50] ellipsis_0.3.2
[51] ggpubr_0.6.0
[52] dplyr_1.1.2
[53] backports_1.4.1
[54] gprofiler2_0.2.1
[55] aod_1.3.2
[56] biomaRt_2.54.1
[57] deldir_1.0-9
[58] MatrixGenerics_1.10.0
[59] SingleCellExperiment_1.20.1
[60] vctrs_0.6.2
[61] Biobase_2.58.0
[62] ROCR_1.0-11
[63] abind_1.4-5
[64] cachem_1.0.8
[65] grr_0.9.5
[66] BSgenome_1.66.3
[67] progressr_0.13.0
[68] sctransform_0.3.5
[69] treeio_1.23.1
[70] GenomicAlignments_1.34.1
[71] prettyunits_1.1.1
[72] goftest_1.2-3
[73] cluster_2.1.4
[74] ExperimentHub_2.6.0
[75] ape_5.7-1
[76] ontologyIndex_2.11
[77] lazyeval_0.2.2
[78] crayon_1.5.2
[79] spatstat.explore_3.2-1
[80] pkgconfig_2.0.3
[81] nlme_3.1-162
[82] pkgload_1.3.2
[83] seriation_1.4.2
[84] ewceData_1.7.1
[85] rlang_1.1.1
[86] globals_0.16.2
[87] lifecycle_1.0.3
[88] miniUI_0.1.1.1
[89] registry_0.5-1
[90] SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20
[91] filelock_1.0.2
[92] BiocFileCache_2.6.1
[93] AnnotationHub_3.6.0
[94] polyclip_1.10-4
[95] matrixStats_1.0.0
[96] lmtest_0.9-40
[97] aplot_0.1.10
[98] Matrix_1.5-4.1
[99] carData_3.0-5
[100] boot_1.3-28.1
[101] zoo_1.8-12
[102] ggridges_0.5.4
[103] png_0.1-8
[104] viridisLite_0.4.2
[105] rjson_0.2.21
[106] ca_0.71.1
[107] bitops_1.0-7
[108] R.oo_1.25.0
[109] KernSmooth_2.23-21
[110] Biostrings_2.66.0
[111] blob_1.2.4
[112] stringr_1.5.0
[113] parallelly_1.36.0
[114] spatstat.random_3.1-5
[115] gridGraphics_0.5-1
[116] rstatix_0.7.2
[117] remaCor_0.0.11
[118] MAGMA.Celltyping_2.0.10
[119] ggsignif_0.6.4
[120] BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1
[121] scales_1.2.1
[122] memoise_2.0.1
[123] magrittr_2.0.3
[124] plyr_1.8.8
[125] ica_1.0-3
[126] gplots_3.1.3
[127] zlibbioc_1.44.0
[128] compiler_4.2.1
[129] BiocIO_1.8.0
[130] RColorBrewer_1.1-3
[131] lme4_1.1-33
[132] fitdistrplus_1.1-11
[133] homologene_1.4.68.19.3.27
[134] Rsamtools_2.14.0
[135] cli_3.6.1
[136] XVector_0.38.0
[137] listenv_0.9.0
[138] patchwork_1.1.2
[139] pbapply_1.7-0
[140] MASS_7.3-60
[141] tidyselect_1.2.0
[142] stringi_1.7.12
[143] yaml_2.3.7
[144] ggrepel_0.9.3
[145] GeneOverlap_1.34.0
[146] grid_4.2.1
[147] VariantAnnotation_1.44.1
[148] tools_4.2.1
[149] future.apply_1.11.0
[150] parallel_4.2.1
[151] rstudioapi_0.14
[152] RNOmni_1.0.1
[153] foreach_1.5.2
[154] piggyback_0.1.4
[155] gridExtra_2.3
[156] Rtsne_0.16
[157] HGNChelper_0.8.1
[158] BiocManager_1.30.20
[159] digest_0.6.31
[160] shiny_1.7.4
[161] Rcpp_1.0.10
[162] car_3.1-2
[163] GenomicRanges_1.50.2
[164] broom_1.0.4
[165] BiocVersion_3.16.0
[166] later_1.3.1
[167] RcppAnnoy_0.0.20
[168] ggdendro_0.1.23
[169] httr_1.4.6
[170] AnnotationDbi_1.60.2
[171] Rdpack_2.4
[172] colorspace_2.1-0
[173] XML_3.99-0.14
[174] fs_1.6.2
[175] tensor_1.5
[176] reticulate_1.28
[177] splines_4.2.1
[178] yulab.utils_0.0.6
[179] uwot_0.1.14
[180] tidytree_0.4.2
[181] spatstat.utils_3.0-3
[182] gh_1.4.0
[183] sp_1.6-1
[184] ggplotify_0.1.0
[185] plotly_4.10.2
[186] xtable_1.8-4
[187] ggtree_3.6.2
[188] jsonlite_1.8.4
[189] nloptr_2.0.3
[190] heatmaply_1.4.2
[191] ggfun_0.0.9
[192] R6_2.5.1
[193] RUnit_0.4.32
[194] EWCE_1.9.0
[195] pillar_1.9.0
[196] htmltools_0.5.5
[197] mime_0.12
[198] glue_1.6.2
[199] fastmap_1.1.1
[200] minqa_1.2.5
[201] BiocParallel_1.32.6
[202] interactiveDisplayBase_1.36.0
[203] codetools_0.2-19
[204] mvtnorm_1.2-1
[205] utf8_1.2.3
[206] lattice_0.21-8
[207] spatstat.sparse_3.0-1
[208] tibble_3.2.1
[209] pbkrtest_0.5.2
[210] curl_5.0.0
[211] leiden_0.4.3
[212] gtools_3.9.4
[213] survival_3.5-5
[214] limma_3.54.2
[215] googleAuthR_2.0.1
[216] munsell_0.5.0
[217] GenomeInfoDbData_1.2.9
[218] iterators_1.0.14
[219] variancePartition_1.28.9
[220] reshape2_1.4.4
[221] gtable_0.3.3
[222] rbibutils_2.2.13
[223] Seurat_4.3.0

</details

Turns out this was already implemented but wasn't working due to a bug that only considered files named ".gz" and not those with the ".bgz" suffix.

Fixed now.