Error in chromInfo file download when running seqlevelsStyle()
Closed this issue · 3 comments
Hi,
I have encountered this error since this afternoon. The error originates from GenomeInfoDb:::fetch_table_from_url() (which I tried to override). It would be good to cache the file instead of downloading each time. I don't know if it's the server's fault but the same error popped up on the HPC and my local machine. I could download the chromInfo.txt.gz by manually navigating the ftp but sometimes there would be forbidden error.
`
library(Seurat)
library(Signac)
library(EnsDb.Mmusculus.v79)
library(BSgenome.Mmusculus.UCSC.mm10)
annotations <- GetGRangesFromEnsDb(ensdb = EnsDb.Mmusculus.v79)
seqlevelsStyle(annotations) <- "UCSC"
`
Error in download.file(url, destfile, quiet = TRUE) (mm10.R#62): cannot open URL 'https://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/chromInfo.txt.gz'
Show stack trace
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome.Mmusculus.UCSC.mm10_1.4.3 BSgenome_1.70.1 rtracklayer_1.62.0
[4] BiocIO_1.12.0 Biostrings_2.70.1 XVector_0.42.0
[7] EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.26.0 AnnotationFilter_1.26.0
[10] GenomicFeatures_1.54.1 AnnotationDbi_1.64.1 Biobase_2.62.0
[13] GenomicRanges_1.54.1 GenomeInfoDb_1.38.0 IRanges_2.36.0
[16] S4Vectors_0.40.1 BiocGenerics_0.48.1 Signac_1.12.0
[19] Seurat_5.0.1 SeuratObject_5.0.1 sp_2.1-2
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.21 splines_4.3.2 later_1.3.2
[4] bitops_1.0-7 filelock_1.0.3 tibble_3.2.1
[7] polyclip_1.10-6 rpart_4.1.23 XML_3.99-0.16
[10] fastDummies_1.7.3 lifecycle_1.0.4 globals_0.16.2
[13] lattice_0.22-5 MASS_7.3-60.0.1 backports_1.4.1
[16] magrittr_2.0.3 rmarkdown_2.25 Hmisc_5.1-1
[19] plotly_4.10.4 yaml_2.3.8 httpuv_1.6.13
[22] sctransform_0.4.1 spam_2.10-0 spatstat.sparse_3.0-3
[25] reticulate_1.34.0 cowplot_1.1.2 pbapply_1.7-2
[28] DBI_1.2.1 RColorBrewer_1.1-3 abind_1.4-5
[31] zlibbioc_1.48.0 Rtsne_0.17 purrr_1.0.2
[34] biovizBase_1.48.0 RCurl_1.98-1.14 nnet_7.3-19
[37] VariantAnnotation_1.46.0 rappdirs_0.3.3 GenomeInfoDbData_1.2.11
[40] ggrepel_0.9.5 irlba_2.3.5.1 listenv_0.9.0
[43] spatstat.utils_3.0-4 goftest_1.2-3 RSpectra_0.16-1
[46] spatstat.random_3.2-2 fitdistrplus_1.1-11 parallelly_1.36.0
[49] DelayedArray_0.28.0 leiden_0.4.3.1 codetools_0.2-19
[52] RcppRoll_0.3.0 xml2_1.3.6 tidyselect_1.2.0
[55] base64enc_0.1-3 matrixStats_1.2.0 BiocFileCache_2.10.1
[58] spatstat.explore_3.2-5 GenomicAlignments_1.38.0 jsonlite_1.8.8
[61] Formula_1.2-5 ellipsis_0.3.2 progressr_0.14.0
[64] ggridges_0.5.5 survival_3.5-7 tools_4.3.2
[67] progress_1.2.3 ica_1.0-3 Rcpp_1.0.12
[70] glue_1.7.0 SparseArray_1.2.2 gridExtra_2.3
[73] xfun_0.41 MatrixGenerics_1.14.0 dplyr_1.1.4
[76] fastmap_1.1.1 fansi_1.0.6 digest_0.6.34
[79] R6_2.5.1 mime_0.12 colorspace_2.1-0
[82] scattermore_1.2 tensor_1.5 dichromat_2.0-0.1
[85] spatstat.data_3.0-4 biomaRt_2.58.0 RSQLite_2.3.4
[88] utf8_1.2.4 tidyr_1.3.0 generics_0.1.3
[91] data.table_1.14.10 S4Arrays_1.2.0 prettyunits_1.2.0
[94] httr_1.4.7 htmlwidgets_1.6.4 uwot_0.1.16
[97] pkgconfig_2.0.3 gtable_0.3.4 blob_1.2.4
[100] lmtest_0.9-40 htmltools_0.5.7 dotCall64_1.1-1
[103] ProtGenerics_1.34.0 scales_1.3.0 png_0.1-8
[106] rstudioapi_0.15.0 knitr_1.45 reshape2_1.4.4
[109] rjson_0.2.21 checkmate_2.3.1 nlme_3.1-164
[112] curl_5.2.0 zoo_1.8-12 cachem_1.0.8
[115] stringr_1.5.1 KernSmooth_2.23-22 parallel_4.3.2
[118] miniUI_0.1.1.1 foreign_0.8-86 restfulr_0.0.15
[121] pillar_1.9.0 grid_4.3.2 vctrs_0.6.5
[124] RANN_2.6.1 promises_1.2.1 dbplyr_2.4.0
[127] xtable_1.8-4 cluster_2.1.6 htmlTable_2.4.2
[130] evaluate_0.23 cli_3.6.2 compiler_4.3.2
[133] Rsamtools_2.18.0 rlang_1.1.3 crayon_1.5.2
[136] future.apply_1.11.1 plyr_1.8.9 stringi_1.8.3
[139] viridisLite_0.4.2 deldir_2.0-2 BiocParallel_1.36.0
[142] munsell_0.5.0 lazyeval_0.2.2 spatstat.geom_3.2-7
[145] Matrix_1.6-5 RcppHNSW_0.5.0 hms_1.1.3
[148] patchwork_1.2.0 bit64_4.0.5 future_1.33.1
[151] ggplot2_3.4.4 KEGGREST_1.42.0 shiny_1.8.0
[154] SummarizedExperiment_1.32.0 ROCR_1.0-11 igraph_1.6.0
[157] memoise_2.0.1 fastmatch_1.1-4 bit_4.0.5
See https://groups.google.com/a/soe.ucsc.edu/g/genome/c/zxS5jah4eZo/m/Jxuprb9BAQAJ
It would be good to cache the file instead of downloading each time.
Caching happens but not when you think it does. seqlevelsStyle(gr) <- "UCSC"
uses getChromInfoFromUCSC()
internally which will download the file only once and cache the result of parsing it (a data.frame).
Note that the caching lasts only for the current session. Caching permanently would not be a good idea because, believe it or not, the content of these chromInfo.txt.gz
files sometimes change (a rare event but it has happened a few times in the past).
Oh, I forgot about this feature but in the meantime you should be able to set global option UCSC.goldenPath.url
to "https://hgdownload.soe.ucsc.edu/goldenPath"
with:
options(UCSC.goldenPath.url="https://hgdownload.soe.ucsc.edu/goldenPath")
Then:
> seqlevelsStyle(annotations) <- "UCSC" # works!
> seqinfo(annotations)
Seqinfo object with 22 sequences (1 circular) from mm10 genome:
seqnames seqlengths isCircular genome
chr3 160039680 FALSE mm10
chrX 171031299 FALSE mm10
chr16 98207768 FALSE mm10
chr7 145441459 FALSE mm10
chr11 122082543 FALSE mm10
... ... ... ...
chr18 90702639 FALSE mm10
chr1 195471971 FALSE mm10
chr12 120129022 FALSE mm10
chr19 61431566 FALSE mm10
chrM 16299 TRUE mm10
It works, thank you very much!