geneontology/gocamgen

Translate has_regulation_target extensions

Opened this issue · 18 comments

Translate whatever has_regulation_target annotation extensions satisfy one of the scenarios described in the google doc.

@vanaukenk @ukemi @thomaspd A question about when the use of has_regulation_target is valid: On the extensions2GO-CAM wiki, has_regulation_target is stated as only valid when primary term is an MF. This conforms with my current rule-checking code. Meanwhile, the google doc above lists strategies for how to handle these when the primary term is one of the listed BP terms or its descendants e.g.

3.a. GP-A [regulation of molecular function Z] has_regulation_target GP-B

In searching the GPADs for examples I see has_regulation_target used on annotations to both MF and BP terms:

MGI     MGI:107771      enables GO:0005096      MGI:MGI:3698699|PMID:17116687   ECO:0000314                     20101101        MGI     has_regulation_target(MGI:MGI:97846)|has_regulation_target(MGI:MGI:2180784)
WB	WBGene00000222	involved_in	GO:0006990	PMID:16184190|WB_REF:WBPaper00026830	ECO:0000315		20131015	WB	has_regulation_target(WB:WBGene00002783)

Can someone set me straight on this?

ukemi commented

It has definitely been used for processes as well as functions.

OK, for now I made has_regulation_target valid for BP annotations with 80f0364. Can revert easily if we change our minds.

I've added a note to the top of the extensions2GO-CAM wiki page that the current import documentation is now on Noctua MOD Imports.

Although the original wiki page said that has_regulation_target was only allowed for MF, the examples given were for BP terms.

Either way, the new documentation page is more complete and should be our reference going forward. We can archive the other page when we're all done.

Thanks @vanaukenk ! Looking through the has_regulation_target section of Noctua MOD imports, I see that I have two of the scenarios ready to look at on my USC server. They're both in the model for WB:WBGene00003167.

regulation of gene expression
image
From line:

WB	WBGene00003167	involved_in	GO:0045944	PMID:9735371|WB_REF:WBPaper00003265	ECO:0000315			20140910	WB	has_regulation_target(WB:WBGene00003168)

With GO:0045944 being a descendant of 'regulation of gene expression' (GO:0010468). I'm not sure if I modeled this correctly according to the wiki so let me know if I need to rearrange anything.

regulation of molecular function
image
From line:

WB	WBGene00003167	involved_in	GO:1903026	PMID:26096732|WB_REF:WBPaper00046959	ECO:0000314			20160329	WB	has_regulation_target(WB:WBGene00006818)

With GO:1903026 being a descendant of 'regulation of molecular function' (GO:0065009).

I'll keep working through the other scenarios in the wiki. Tagging @ukemi .

ukemi commented

Hi @dustine32 and @vanaukenk. The bottom one is correct, but a little unsatisfying to me. It would make more sense to have the binding function is the target of regulatory process. As a chain relation it would have been regulates_o_enabled-by.

ukemi commented

Actually, looking at this again, I think it's the best we can do but it will result in two annotations, one to the process and one to the function. Do we want that? Although correct, it is not a literal translation of the original annotation line.

ukemi commented

Actually, the original annotation is to the process that negatively regulates the function. I think we need to limit the descendants.

The bottom example is actually really interesting. I just went back to the original paper to see if we were capturing the biology correctly with the original annotation. Spoiler alert: we're not.

That said, @ukemi, do you think a better model for BPs that regulate MFs would be:

GP-A <-enabled_by [root MF] part_of [regulation of MF] regulates [MF] enabled_by -> GP-B

(with the corresponding positive/negative versions)?

For the GPAD output, perhaps we could then devise a rule to get the chained extension, regulates_o_enabled_by back out.

@ukemi @vanaukenk so does the top example's translation look correct? That was one that I was most worried about deciphering the written pattern:

GP-A<-enabled_by-[root MF]-part_of->[regulation of Z]-has_input->GP-B,-causally upstream of (positive/negative effect)->[root MF]-enabled_by->GP-B

If it is wrong, can you create an example model somewhere showing the correct translation? Thanks!

@dustine32 - the top example translation looks correct to me.

ukemi commented

Yup. It's looks like the model that was agreed to on one of the calls.

@vanaukenk @ukemi Awesome I'll stop worrying about that one then, thanks!

@vanaukenk @ukemi Just reread @vanaukenk 's comment above about how to correctly translate the bottom example annotation:

GP-A <-enabled_by [root MF] part_of [regulation of MF] regulates [MF] enabled_by -> GP-B

From what I get out of this, the only change I need to make is to move the regulates triple to have the regulation BP term (GO:1903026) as the subject instead of the root MF, like here?

image

@vanaukenk @ukemi @thomaspd I believe the latest push of models to noctua-dev is the first time has_regulation_target translations have appeared there. Here's the model for WB:WBGene00003167 that we've been working with.

But I still have this question about how to correctly translate "regulation of molecular function" has_regulation_target extensions.

@dustine32
The model above for 'regulation of molecular function' looks correct to me, i.e. the regulatory BP negatively regulates the MF of the 'regulation target'.

@vanaukenk @ukemi Working on translating has_target_regulation on annotations of "molecular function regulator" descendants and I noticed that "molecular function regulator" (GO:0098772) is a direct part_of child of "regulation of molecular function" (GO:0065009), which is the root that encompasses the BP descendants section. So any descendant of molecular function regulator is also a descendant of regulation of molecular function.

When this occurs should I choose to translate according to the rules for the more specific term (e.g. GO:0098772 is more specific than GO:0065009)?

ukemi commented

Hi @dustine32. I would suggest that we always follow the rule for the most specific class for which there is a rule. So if a gene product is annotated to 'molecular function regulator', it should follow the rule for that class. In that case, it should directly regulate either a generic MF or the one that is specified.

Is this done?