anhtr/HPAanalyze

`hpaDownload()` is failing unless `version="example/build-in"`

Opened this issue · 3 comments

Hey! I have tried a lot of things for downloadList, but any call of hpaDownload is failing, except when using example/build-in datasets. Any ideas? Thanks a lot!

example:

> library(HPAanalyze)
> 
> dat <- hpaDownload("Normal tissue", "example")
Only the followings are example/built-in datasets: 
 - Normal tissue 
 - Pathology 
 - Subcellular location 
Other datasets will not be loaded
> summary(dat)
              Length Class  Mode
normal_tissue 6      tbl_df list
> 
> dat <- hpaDownload("Normal tissue")
trying URL 'https://www.proteinatlas.org/download/normal_tissue.tsv.zip'
Error in download.file(url = downloadDatasets$urls[[i]], destfile = temp) : 
  cannot open URL 'https://www.proteinatlas.org/download/normal_tissue.tsv.zip'
In addition: Warning message:
In download.file(url = downloadDatasets$urls[[i]], destfile = temp) :
  cannot open URL 'https://www.proteinatlas.org/download/normal_tissue.tsv.zip': HTTP status was '404 Not Found'

session info:

> sessionInfo()
R version 4.4.1 Patched (2024-07-08 r86893)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] HPAanalyze_1.24.0 dplyr_1.1.4      

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       zip_2.3.1         cli_3.6.3         rlang_1.1.4       stringi_1.8.4     generics_0.1.3   
 [7] glue_1.8.0        colorspace_2.1-1  scales_1.3.0      fansi_1.0.6       grid_4.4.1        munsell_0.5.1    
[13] tibble_3.2.1      openxlsx_4.2.7.1  lifecycle_1.0.4   compiler_4.4.1    Rcpp_1.0.13-1     pkgconfig_2.0.3  
[19] rstudioapi_0.17.1 R6_2.5.1          tidyselect_1.2.1  utf8_1.2.4        pillar_1.9.0      magrittr_2.0.3   
[25] tools_4.4.1       gtable_0.3.6      xml2_1.3.6        ggplot2_3.5.1    

I have found the same issue. It looks like the previous link: "https://www.proteinatlas.org/download/subcellular_location.tsv.zip" doesn't host the data anymore. In the website i found this other link "https://www.proteinatlas.org/download/tsv/subcellular_locations.tsv.zip" which seems to be working, if it could be loaded into hpaDownload function maybe would work.
If you find another solution i would appreciate to know it

If you'd like, you can download the latest version of any data you want manually from the website "https://www.proteinatlas.org/about/download", and unzip it and load into R. It worked for me.

Sure, though I was hoping for a programmatic way of doing this. I now have a solution where I read the data directly into R fro the web, without downloading any files locally. I assume this is what the package was designed to do. Maybe it'll be solved eventually...