Generalize conversions so that schema changes are handled automatically
jay-m-dev opened this issue · 1 comments
jay-m-dev commented
Conversions can be generalized so that updates to the KG schema don't need to be hard-coded.
AlzKB v1.2.0 now includes bidirectional relationships (GeneInteractsWithGene). So, we need to keep this in mind.
jay-m-dev commented
We need to standardize the format of the input CSV so that conversions can be standardized.
One solution is to use multiple CSVs with the following headers:
1st CSV
- relationship_id (required)
- src_node_id (required)
- target_node_id (required)
2nd CSV - node_id (required)
- node_label (required)
- a column for each property (optional)
As an example, in AlzKB we have the following structure and properties:
- node_id exists in AlzKB as id (CommonName is not unique enough)
- node_label is the relationship type
- The optional properties for nodes are:
- Gene:
- typeOfGene, geneSymbol
- Drug:
- current properties not used
- Disease:
- [identify and insert properties]
- Symptom:
- [identify and insert properties]
- BodyPart
- [identify and insert properties]
- Pathway
- [identify and insert properties]
- BiologicalProcess
- [identify and insert properties]
- CelluarComponent
- [identify and insert properties]
- MolecularFunction
- [identify and insert properties]
- DrugClass
- [identify and insert properties]
- Gene: