remove knowledge source parameters from node files (instead rely on provided_by for nodes)
sierra-moxon opened this issue · 4 comments
run a KGX transform with the knowledge_sources parameter, and pass it values for both aggregator_knowledge_source and primary_knowledge_source, then only primary_knowledge_source gets added to the edge file but all of its values are added to a neighboring provided_by column
obojson->tsv in particular.
so far unable to reproduce, with these input and output args:
input_args = {
"filename": [
os.path.join(RESOURCE_DIR, "pato.json")
],
"format": "obojson",
"provided_by": True,
"aggregator_knowledge_source": True,
"primary_knowledge_source": True
}
output_args = {
"filename": os.path.join(TARGET_DIR, "pato-export.tsv"),
"format": "tsv",
}
I can not replicate the edge file issues noted in this ticket, but I can see (the expected) provided_by populated in the node file as expected. Since knowledge_source properties are currently association slots, we don't expect them to be found on nodes directly.
One thing we can do to make the node file more understandable w/re to provenance, is to remove the knowledege_source properties that get added there.
fixed with #405