Knowledge-Graph-Hub/kg-covid-19

Normalize Drug nodes to single prefix

Closed this issue · 0 comments

Describe the bug

Drug nodes in KG-COVID-19 have a variety of prefix types, which makes it difficult to reconcile identical nodes.
Ideally, all should be normalized to a single prefix (i.e., DrugCentral).

To Reproduce

$ grep 'biolink:Drug' merged-kg_nodes.tsv | awk -F":" '{print $1}' | sort | uniq
CHEBI
CHEMBL.COMPOUND
DRUGBANK
DrugCentral
PHARMGKB
ttd.drug

Expected behavior

Instances of biolink:Drug should have their prefixes normalized to DrugCentral at ingest.
This can be done through a SSSOM map similar to that for KG-IDG (https://github.com/Knowledge-Graph-Hub/kg-idg/blob/master/maps/drugcentral-maps-0.1.sssom.tsv) through some source-specific ID curation may be necessary for this KG.

CHEBI is ingested as a full ontology, so it may make sense to retain its CHEBI prefixes rather than attempting to remap all of them to an external database.
Instead, we can define Association relations between CHEBI nodes and corresponding DrugCentral nodes.