Investigate Scov2 polyprotein listed as a species in Noctua
vanaukenk opened this issue · 13 comments
During the QC checks for bringing Noctua up after the 2022-05-26 outage, I noticed a suspicious entry, pp1ab Scov2, in the list of species:
I thought pp1ab was a polyprotein and that's how it looks in noctua-amigo:
@balhoff @tmushayahama - can you take a look to see why this entry is included as a species? Thanks.
Also tagging @kltm
@tmushayahama how is that list created? (What service does it call to get it?) 'pp1ab Scov2' does not look like a taxon at least in the latest NEO file.
@balhoff
@tmushayahama uses the taxon API from minerva (/taxa)
As a hint, noting that the /taxa API is returning:
{ id: "http://identifiers.org/uniprot/P0DTD1", label: "pp1ab Scov2" }
Noting this found in neo.obo:
[Term]
id: UniProtKB:P0DTD1-PRO_0000449619
name: nsp1 Scov2
synonym: "nsp1" BROAD []
synonym: "P0DTD1-PRO_0000449619" RELATED []
synonym: "protein" RELATED []
is_a: CHEBI:33695
relationship: has_gene_template PR:000050270%7CUniProtKB%3AP0DTD1-PRO_0000449635%7CPRO_0000449635
relationship: in_taxon UniProtKB:P0DTD1 ! pp1ab Scov2
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProduct
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine
I don't believe in_taxon is supposed to work like that.
It looks like the taxon is off by one for GPI 1.2?
UniProtKB P0DTD1-PRO_0000449619 nsp1 Host translation inhibitor nsp1|P0DTD1(1-180)|rep/Clv:nsp1 (SARS2)|PRO_0000449619|nsp1 (SARS2)|UniProtKB:P0DTD1, 1-180|leader protein (SARS2)|UniProtKB:P0DTC1, 1-180|non-structural protein 1 (SARS2)|nsp-1|ns1|ns-1|host translation inhibitor nsp1|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 1 protein taxon:2697049 UniProtKB:P0DTD1 PR:000050270|UniProtKB:P0DTD1-PRO_0000449635|PRO_0000449635
http://geneontology.org/docs/gene-product-information-gpi-format/
Related to geneontology/go-site#1431
@kltm it seems like you found the problem. But in the neo.owl I downloaded yesterday I saw in_taxon NCBITaxon:2697049
. I wonder why the discrepancy?
@balhoff Yeah, there's some stuff I'm not sure about here, especially as that file has not been touched in years, so I'm not sure why it's a problem now.
I'm tagging upstream contributors @cmungall and @justaddcoffee to confirm format for GPI 1.2.
If we understand this correctly, this should be fixed on next NEO release.
Hm. Apparently not. Still appearing on Noctua landing page.