geneontology/gocamgen

Missing data in models

Opened this issue · 3 comments

Finish up pulling in these fields from GPAD:

  • Contributor (will be in annotation properties eventually)
  • Assigned by (e.g. WB, MGI, UniProt)
  • with/from (model definition required for this first)
  • Date

Contributor should be orcID. In some lines (like protein2go) this is in the curator_uri property in annotation properties column. If curator_uri not present, xref name against users.yaml to try getting orcID. If still can't find orcID, leave blank, one will need to be created.

@ukemi sent an example GPAD of how contributor will come in when we get the export GPAD for the import. An example line:

MGI	MGI:98956	enables	GO:0005102	MGI:MGI:4834177|GO_REF:0000096	ECO:0000266	UniProtKB:P56704		20160118	MGI		contributor=http://orcid.org/0000-0001-5501-853X|contributor=http://orcid.org/0000-0001-7476-6306|comment=blah blah blah

Contributor could be coming in on GPAD line through either contributor (ORCID), curator_uri (ORCID), or curator_name (string name). Additional orcid and date fields will be added to GPAD spec when it's updated.

Will need annotation_properties column to be available in the parsed data from ontobio's GpadParser. biolink/ontobio#288