rdkit/neo4j-rdkit

change node creation behaviour ("luri")

pi-at-git opened this issue · 1 comments

CREATE (n:Entity:Chemical:Compound:Structure {
luri: 'test1',
preferred_name: 'chloro benzene',
smiles: 'ClC1=CC=CC=C1'})
creates a node the chemical structure for chloro benzene:

The property "luri" is supposed to be a unique resource identifier for the node (originally: legacy uri). I believe it makes sense to have such resource identifier on nodes bearing structures. Suggestion: if luri-attribute is not deliberately set by user upon data ingestion the plugin should create the luri-property and assign a UUID to it

The query
CALL org.rdkit.search.exact.smiles(['Chemical', 'Structure'], 'ClC1=CC=CC=C1')
yields:

{ "columns" : [ "name", "luri", "canonical_smiles" ], "data" : [ [ "chloro benzene", "test1", "Clc1ccccc1" ] ] }

in CREATE the property "preferred_name" was set, query delivers a property "name". Suggestion: eliminate "name" from output of any search query (org.rdkit.search.exact.smiles, org.rdkit.search.exact.mol, org.rdkit.search.substructure.smiles, org.rdkit.search.substructure.mol). treat preferred_name in CREATE statement as any other property

Auto assignment of UUIDs is already an existing feature of the APOC library, see https://neo4j.com/labs/apoc/4.1/overview/apoc.uuid/apoc.uuid.install/

To auto-assign a luri to all nodes with label Entity is as simple as running once: `CALL apoc.uuid.install('Entity', {uuidProperty: 'luri'})

Since we have an approriate solution in place, I'll close that issue off.