EpistasisLab/KRAGEN

Generalize conversions so that schema changes are handled automatically

jay-m-dev opened this issue · 1 comments

Conversions can be generalized so that updates to the KG schema don't need to be hard-coded.
AlzKB v1.2.0 now includes bidirectional relationships (GeneInteractsWithGene). So, we need to keep this in mind.

We need to standardize the format of the input CSV so that conversions can be standardized.
One solution is to use multiple CSVs with the following headers:
1st CSV

  • relationship_id (required)
  • src_node_id (required)
  • target_node_id (required)
    2nd CSV
  • node_id (required)
  • node_label (required)
  • a column for each property (optional)

As an example, in AlzKB we have the following structure and properties:

  • node_id exists in AlzKB as id (CommonName is not unique enough)
  • node_label is the relationship type
  • The optional properties for nodes are:
    • Gene:
      • typeOfGene, geneSymbol
    • Drug:
      • current properties not used
    • Disease:
      • [identify and insert properties]
    • Symptom:
      • [identify and insert properties]
    • BodyPart
      • [identify and insert properties]
    • Pathway
      • [identify and insert properties]
    • BiologicalProcess
      • [identify and insert properties]
    • CelluarComponent
      • [identify and insert properties]
    • MolecularFunction
      • [identify and insert properties]
    • DrugClass
      • [identify and insert properties]