geneontology/noctua-models

Load remaining ZFIN Models into noctua

Closed this issue · 14 comments

Load remaining ZFIN Models into noctua

From @kltm It looks like on the next iteration, the ZFIN group, species, and a different contributor might be good additions--they are a little hard to find right now.

kltm commented

Also tagging @dustine32 for this location.
Please feel free to move this to any convenient repo.

@dustine32 - I tested the ZFIN models, and here are the issues I found (note that some of these might have already been reported)

  1. the contributor is incorrect for all models: it should be “ZFIN” or something like “GOC:zfin_curators”

  2. “NOT” annotations are not reported as “not”, they are reported as “regular/positive” annotations
    Note: I have an example in which there are both “yes’ and “not” annotations to the same term. For example: ZFIN:ZDB-GENE-000616-1

  3. ZFIN-PUB-### id which does not have a corresponding PMID are not reported in the model reference

  4. “binding” term with IPI : the gene in the “with” field is displayed in the model as “has input” (which is correct), but the “with” field is missing in the evidence in the model.

  5. “binding” term with IPI: when the ID is in the “with” field is an ID which is not in the GPI, this ID was not reported as “has input”
    Could we show this ID in the “has input” even though this ID might not be in Neo yet?

  6. same as 5, but the ID is in the “with” field refers to a non ZFIN gene

  7. there are 2 dates in some of the “boxes” in which the evidences have different dates

examples can be found here: https://docs.google.com/spreadsheets/d/1o5Wa0T16RR2WF-bI_JqTsczjiszHki4vnp49fM1_mpg/edit?usp=sharing

Note: I checked the term-GP, the date, model name,... and everything looks ok.

  1. the contributor is incorrect for all models: it should be “ZFIN” or something like “GOC:zfin_curators”

This can actually be handled in the input ZFIN GPAD file by using the annotation properties column (e.g. contributor-id=GOC:zfin_curators). If this is populated in the GPAD, the conversion code will set it in the model.

  1. “NOT” annotations are not reported as “not”, they are reported as “regular/positive” annotations
    Note: I have an example in which there are both “yes’ and “not” annotations to the same term. For example: ZFIN:ZDB-GENE-000616-1

There's an open issue here: geneontology/gocamgen#10 I have some code for this that I still need to test. I'll use your example model here.

  1. ZFIN-PUB-### id which does not have a corresponding PMID are not reported in the model reference

I think this is a consequence of geneontology/gocamgen#31 but let me know if you found an evidence that's completely missing any pub reference.

  1. “binding” term with IPI : the gene in the “with” field is displayed in the model as “has input” (which is correct), but the “with” field is missing in the evidence in the model.
  2. “binding” term with IPI: when the ID is in the “with” field is an ID which is not in the GPI, this ID was not reported as “has input”
    Could we show this ID in the “has input” even though this ID might not be in Neo yet?
  3. same as 5, but the ID is in the “with” field refers to a non ZFIN gene

It'd probably be easiest to just discuss these on the next call with @ukemi and @vanaukenk.

  1. there are 2 dates in some of the “boxes” in which the evidences have different dates

Soon to be fixed! I have some new code (wasn't ready for this load) that will only use the max date here.

Thanks again for the feedback @sabrinatoro!

thanks @dustine32 , this is great! :) And thanks for pointing me to the code to run these models thru ShEx as well! I've attached the summary output to this case. For number1 above - the spec says contributor_id should be an ORCID - its ok to be a curie as per above, right?

activity_report.txt
explanations.txt
main_report.txt

@sierra-moxon Yep! GOC:id curie is fine per some of our recent calls and also this ticket. The value in the GPAD's contributer-id property just needs to match a uri value in users.yaml in order for the Noctua landing page search and display to work.

  1. ZFIN-PUB-### id which does not have a corresponding PMID are not reported in the model reference

I just found an example of this (ZFIN:ZDB-TRNAG-011205-6) where it's completely dropping the reference due to some dumb bug that I can fix with just one line.

activity_report.txt
explanations.txt
main_report.txt
gorules_report.json.gz

These are the ShEx check outputs from the latest round of models for your review @sabrinatoro. These should be available for review on noctua-dev.

@dustine32. Here is my report of the issues I found after the latest round. All of them (except for the first one) are the same as previously (you are probably still working on these-but still added them here for completeness).
Since there is nothing new, the same examples can be use (however let me know if you want new examples).

  • the contributor field is not displayed properly :
    image

  • (same as previously reported) the “NOT” annotations are not reported as “not” (maybe this is still in the process of being fixed)
    Note: I have an example in which there are both “yes’ and “not” annotations to the same term. For example: ZFIN:ZDB-GENE-000616-1

  • (same as previously reported) the references which have only a ZFIN-PUB-### id (and does not have a corresponding PMID) are not shown [source : none]
    image

  • The multiple date issue is still there. i don't remember if we decided that it is an issue or not.

@sabrinatoro Thanks for re-testing! Sorry the results haven't change. I'm wondering if some of these were just due to the ontobio code being out of date. That weird contributor bug fix should have been merged in ontobio/master branch since 2021-02-25 (specifically, this commit biolink/ontobio@678ed6d). @sierra-moxon Could you check that this commit is in your ontobio git log? I was at least able to get the right contributor format when regenerating model ZFIN:ZDB-TRNAG-011205-38 with the current ontobio/master code.

For the NOTs, the missing references, and the multiple date issue, those fixes are in but not yet merged to master. @sierra-moxon If you want, you can pull this latest code from gocamgen branch and regen to see if these get fixed as well. As a good practice, I've been trying to wait till gocamgen changes get merged to master before generating the full MOD loads, but I often get too excited (as in this PR for WB/MGI).

Sorry for all the "branch" confusion! It keeps things spicy I guess.

@dustine32
I re-tested the models, and I couldn't find any issues.
Thank you !

@sabrinatoro and I talked to ZFIN today - they are ready to start testing the round trip and want to be ready for ZFIN models release with the June GO release. @kltm kindly volunteered to set up a branch of the pipeline with ZFIN models that we could get started on. :)