callahantiff/PheKnowLator

Enhancement: Improve Networkx MultiDiGraph Metadata

Closed this issue · 4 comments

TASK

Task Type: CODEBASE

Improve the node and edge metadata when outputting the Networkx MultiDiGraph versions of each build. Thanks to @rkboyce, who suggested that we could make very small changes to the current Network graph and drastically improve the usability of the output structure.

TODO

Impacted Scripts:

  • knowledge_graph.py
  • converts_rdflib_to_networkx() in utils/kg_utils.py

Needed Functionality:

  • Add a helper function to utils/kg_utils.py that can be called by converts_rdflib_to_networkx(). The helper function will set graph attributes for edges:
    • key: a unique value for each predicate with respect to the triple it appears in, could be a hash of the triple. Just need to ensure that it is unique
    • weight: default to 0

@rkboyce, can you please verify that I have covered the needed changes that we discussed this week correctly above?

I will also be implementing a few changes to the OWL-NETS architecture (issue #56) and will be storing the collapsed semantic information from the full graph as attributes of the transformed OWL-NETS graph, likely in the form of edge and and node dictionary entries.

Hi @callahantiff - I agree with the summary for the most part. My suggestion is to make the key some identifier unique across the knowledge graph. Could be just an incremented integer unique to each relation with respect to the triple that it occurs in. I like to use 'predicate' for the URIRef that represents the edge relationship (which will likely be from an ontology e.g. RO and not unique), and weight should be 0.0 as you indicated.

Thanks so much @rkboyce, that's exactly what I needed to know!

Done! Note that this representation now includes keys for nodes and edges and has a default weight of 0.0:

  • Node key: str(http://purl.obolibrary.org/obo/CHEBI_35406)
  • Relation key: MD5 hash of triple ensures that each key is unique with respect to the triple it occurs in ➞
     hash(
          'http://purl.obolibrary.org/obo/CHEBI_35406' + 
          'http://www.w3.org/2000/01/rdf-schema#subClassOf' +
          'http://purl.obolibrary.org/obo/CHEBI_29067'
              )
    

Completed as part of #84