neurogenomics/MAGMA_Celltyping

get_genomeLocFile() issue

shivani-raja opened this issue · 8 comments

Hi Brian,

I'm having an issue with downloading the NCBI37.3.gene.loc file when running map_snps_to_genes. Alan mentioned there's an error with the get_genomeLocFile() function and suspects it's to do with piggyback and Windows.

The line d <- data.table::fread(tmp, nrows = 10) returned the error

Error in data.table::fread(tmp, nrows = 10) : 
  Single column input contains invalid quotes. Self healing only effective when ncol>1

When Alan sent me the NCBI37.gene.loc file then there was no error in data.table::fread.. I'm using MAGMA 2.0.1 and MungeSumStats 1.3.4 on Windows.

Thanks in advance!
Shivani

Could you include your sessionInfo?

Sure!

R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.6.0        Biobase_2.54.0             
 [3] httr_1.4.2                  jsonlite_1.7.3             
 [5] bit64_4.0.5                 R.utils_2.11.0             
 [7] assertthat_0.2.1            stats4_4.1.2               
 [9] BiocFileCache_2.2.1         blob_1.2.2                 
[11] BSgenome_1.62.0             GenomeInfoDbData_1.2.7     
[13] Rsamtools_2.10.0            yaml_2.2.2                 
[15] progress_1.2.2              pillar_1.7.0               
[17] RSQLite_2.2.10              lattice_0.20-45            
[19] glue_1.4.2                  digest_0.6.29              
[21] googleAuthR_2.0.0           GenomicRanges_1.46.1       
[23] XVector_0.34.0              colorspace_2.0-2           
[25] Matrix_1.4-0                R.oo_1.24.0                
[27] XML_3.99-0.8                pkgconfig_2.0.3            
[29] biomaRt_2.50.3              zlibbioc_1.40.0            
[31] purrr_0.3.4                 scales_1.1.1               
[33] BiocParallel_1.28.3         tibble_3.1.3               
[35] KEGGREST_1.34.0             generics_0.1.2             
[37] IRanges_2.28.0              ggplot2_3.3.5              
[39] ellipsis_0.3.2              cachem_1.0.6               
[41] SummarizedExperiment_1.24.0 GenomicFeatures_1.46.4     
[43] BiocGenerics_0.40.0         cli_3.1.1                  
[45] magrittr_2.0.1              crayon_1.5.0               
[47] memoise_2.0.1               R.methodsS3_1.8.1          
[49] fs_1.5.2                    fansi_0.5.0                
[51] xml2_1.3.3                  data.table_1.14.2          
[53] tools_4.1.2                 prettyunits_1.1.1          
[55] hms_1.1.1                   gargle_1.2.0               
[57] BiocIO_1.4.0                lifecycle_1.0.1            
[59] matrixStats_0.61.0          stringr_1.4.0              
[61] S4Vectors_0.32.3            munsell_0.5.0              
[63] DelayedArray_0.20.0         AnnotationDbi_1.56.2       
[65] Biostrings_2.62.0           compiler_4.1.2             
[67] GenomeInfoDb_1.30.1         rlang_0.4.11               
[69] grid_4.1.2                  RCurl_1.98-1.6             
[71] rstudioapi_0.13             rappdirs_0.3.3             
[73] VariantAnnotation_1.40.0    rjson_0.2.21               
[75] bitops_1.0-7                restfulr_0.0.13            
[77] gtable_0.3.0                curl_4.3.2                 
[79] DBI_1.1.2                   R6_2.5.1                   
[81] GenomicAlignments_1.30.0    dplyr_1.0.7                
[83] rtracklayer_1.54.0          fastmap_1.1.0              
[85] bit_4.0.4                   utf8_1.2.2                 
[87] filelock_1.0.2              MungeSumstats_1.3.4        
[89] stringi_1.7.6               parallel_4.1.2             
[91] Rcpp_1.0.8                  vctrs_0.3.8                
[93] png_0.1-7                   dbplyr_2.1.1               
[95] tidyselect_1.1.1           

hey @shivani-raja, check out #100 . If this is persisting i think you have several options:

  1. After reinstalling MAGMA.Celltyping from GitHub (just pushed a couple changes), move the manually downloaded NCBI reference file to the folder path produced from this function:
tools::R_user_dir("MAGMA.Celltyping", which="cache")
## e.g. "/Users/schilder/Library/Caches/org.R-project.R/R/MAGMA.Celltyping"
  1. Create a MAGMA.Celltyping Docker container and run analyses within that. It shouldn't' have any of these problems since it uses a Linux environment.
  2. Reach out to thepiggyback authors and ask them to fix this issue.

@shivani-raja did this resolve itself, or was the piggyback team able to fix it?

Not sure, but just saw this and was wondering if it's related. I assume it's since been integrated into the master branch but notice the Issue is still open:
ropensci/piggyback#49

Hi Brian,

Apologies - am working on a couple of other projects at the moment so didn't get around to this. Will update here when I do :)

Ok, so i tested this on a docker container and was able to replicate the issue. The root cause is indeed this: ropensci/piggyback#49

I can see when I view the file it's not the NCBI resource at all, but this error message stored as a text file!

Screenshot 2022-04-07 at 10 07 47

Will work with the piggyback developers to try and get this fixed asap

I've added a check to the get_data step that get_genomeLocFile relies on. If it detects that the download failed due to the "Bad credentials" issue, it gives instructions on how to set up GitHub Personal Access Tokens (which piggyback::pb_download is currently failing without).

Until the issue with piggyback is resolved, the instructions in this error message should serve as a temporary fix.

Error: piggyback::pb_download() failed due to bad GitHub credentials. Please add a GitHub Personal Access Token (PAT) to your ~/.Renviron file and restart R before retrying this function.
e.g.:

    GITHUB_TOKEN=<your_PAT_here> 

If you do not yet have a GitHub PAT, please follow instructions here:

    https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token

Implemented here:
https://github.com/neurogenomics/MAGMA_Celltyping/blob/master/R/get_data.R

According to the piggyback authors, the GitHub token issue was fixed in the latest CRAN version. So should be all good now.

Let me know if you notice something like this again tho!