Bioconductor/AnnotationHub

Non-descriptive error

Closed this issue · 4 comments

myhub = AnnotationHub()
snapshotDate(): 2021-05-18
getInfoOnIds(myhub, "AH72154")
myhub_id fetch_id title rdataclass status biocversion rdatadateadded rdatadateremoved
288111 AH72154 78900 org.Salmo_salar.eg.sqlite OrgDb Public 3.9 2019-05-02 NA
file_size
288111 161341440
myhub[["AH72154"]]
Error: Public

Hiya, the db is present as can be seen above, but I'm not sure what this error message means?

lshep commented

sorry. yes I need to improve the error warnings for org packages. orgDb packages are updated per release so likely the orgDb that you wish to access is too old for your version of R/Bioconductor. which if we query for your species, indeed there are more recent versions with more accurate information

> query(myhub, "org.Salmo")
AnnotationHub with 9 records
# snapshotDate(): 2021-09-23
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Salmo tshawytscha, Salmo trutta, Salmo salar, Salmo nerka, Salmo...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH93861"]]' 

            title                          
  AH93861 | org.Salmo_mykiss.eg.sqlite     
  AH93874 | org.Salmo_kisatch.eg.sqlite    
  AH93875 | org.Salmo_trutta.eg.sqlite     
  AH93881 | org.Salmo_salar.eg.sqlite      
  AH93888 | org.Salmo_tshawytscha.eg.sqlite
  AH93896 | org.Salmo_namaycush.eg.sqlite  
  AH93905 | org.Salmo_alpinus.eg.sqlite    
  AH93910 | org.Salmo_nerka.eg.sqlite      
  AH93913 | org.Salmo_keta.eg.sqlite       

Can I piggyback off of this issue?
I am currently working with Atlantic salmon, and I did some functional analysis last year in February based off of the OrgDb record that was available at the time.
I am repeating the analysis now with a different record (AH111638), and I am getting very different results in terms of number of GO terms picked up in Over Representation Analysis.

Am I able to see if this more recent record has replaced the old one? I do not remember the reference for the old one, nor did I write it down anywhere since I used to create the object by doing sasa <- query(ah, c('OrgDb', 'Salmo salar'))[[1]].

lshep commented

We replace OrgDbs every release to have updated information. OrgDbs are closely associated with the Bioconductor release version and R version. You can tell the date of the added resource by the rdatadateadded in the query information

> query(ah, c('OrgDb', 'Salmo salar'))
AnnotationHub with 1 record
# snapshotDate(): 2023-10-05
# names(): AH111638
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Salmo salar
# $rdataclass: OrgDb
# $rdatadateadded: 2023-04-24
# $title: org.Salmo_salar.eg.sqlite
# $description: NCBI gene ID based annotations about Salmo salar
# $taxonomyid: 8030
# $genome: NCBI genomes
# $sourcetype: NCBI/UniProt
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.uniprot.org/p...
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation") 
# retrieve record with 'object[["AH111638"]]' 

To replicate the analysis you would have to use the same version of R and Bioconductor used at the time. Likely Bioconductor 3.16

> temp = ah[["AH107424"]]
Error: AH107424 is an OrgDb resource.
  orgDb resources are generated for specific biocversions.
  Requested resource works with biocversion: 3.16
  To find a resource appropriate for your biocversion try the following query:
      query(ah,'org.Salmo_salar.eg.sqlite')

As you can see the ERROR message for the OrgDb has also been updated to be more descriptive and what version would likely be appropriate to be able to replicate the findings.