geneontology/minerva

Investigate Scov2 polyprotein listed as a species in Noctua

vanaukenk opened this issue · 13 comments

During the QC checks for bringing Noctua up after the 2022-05-26 outage, I noticed a suspicious entry, pp1ab Scov2, in the list of species:

image

I thought pp1ab was a polyprotein and that's how it looks in noctua-amigo:

image

@balhoff @tmushayahama - can you take a look to see why this entry is included as a species? Thanks.

Also tagging @kltm

@tmushayahama how is that list created? (What service does it call to get it?) 'pp1ab Scov2' does not look like a taxon at least in the latest NEO file.

@balhoff
@tmushayahama uses the taxon API from minerva (/taxa)

kltm commented

As a hint, noting that the /taxa API is returning:
{ id: "http://identifiers.org/uniprot/P0DTD1", label: "pp1ab Scov2" }

kltm commented

Noting this found in neo.obo:

[Term]
id: UniProtKB:P0DTD1-PRO_0000449619
name: nsp1 Scov2
synonym: "nsp1" BROAD []
synonym: "P0DTD1-PRO_0000449619" RELATED []
synonym: "protein" RELATED []
is_a: CHEBI:33695
relationship: has_gene_template PR:000050270%7CUniProtKB%3AP0DTD1-PRO_0000449635%7CPRO_0000449635
relationship: in_taxon UniProtKB:P0DTD1 ! pp1ab Scov2
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProduct
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine

I don't believe in_taxon is supposed to work like that.

kltm commented

It looks like the taxon is off by one for GPI 1.2?

UniProtKB	P0DTD1-PRO_0000449619	nsp1	Host translation inhibitor nsp1|P0DTD1(1-180)|rep/Clv:nsp1 (SARS2)|PRO_0000449619|nsp1 (SARS2)|UniProtKB:P0DTD1, 1-180|leader protein (SARS2)|UniProtKB:P0DTC1, 1-180|non-structural protein 1 (SARS2)|nsp-1|ns1|ns-1|host translation inhibitor nsp1|Severe acute respiratory syndrome (SARS) coronavirus nonstructural protein 1	protein	taxon:2697049	UniProtKB:P0DTD1	PR:000050270|UniProtKB:P0DTD1-PRO_0000449635|PRO_0000449635

http://geneontology.org/docs/gene-product-information-gpi-format/

@kltm it seems like you found the problem. But in the neo.owl I downloaded yesterday I saw in_taxon NCBITaxon:2697049. I wonder why the discrepancy?

kltm commented

@balhoff Yeah, there's some stuff I'm not sure about here, especially as that file has not been touched in years, so I'm not sure why it's a problem now.
I'm tagging upstream contributors @cmungall and @justaddcoffee to confirm format for GPI 1.2.

kltm commented

From @cmungall , we can go ahead and manually fix this file ourselves upstream.

kltm commented

If we understand this correctly, this should be fixed on next NEO release.

kltm commented

Hm. Apparently not. Still appearing on Noctua landing page.