aplbrain/grand-cypher

Entity types

Closed this issue · 3 comments

In Cypher, node and edge types are represented by :ColonNotation. For example,

(A:Neuron)-[AB:Synapse]->(B:Neuron)

NetworkX has no concept of entity "types," so this will be the first time that this codebase mandates a data schema (i.e., a type attribute on the entities in the graph). I'm not sure this is something I want to enforce, but if we do decide to use vertex/edge attributes like this, I'd like to open discussion in this issue to establish what schema we want to support.

I think somehow type serves as a kind of index. The graph search engine can leverage this to search for a subset of nodes/edges instead of searching for all nodes. Our algorithm does not benefit from this just yet. But it's nice to have, maybe for the sake of being cypher (?).

If it is to be done, I recommend it is to be stored under the __type__ property. Why two-end double dashes? it more adheres to the python convention and leaves room for other usages. It appears to be intimidating for users to construct a graph by themselves following this convention. But it will go away as soon as graph mutation is supported.

Oh I like __type__ __label__, that seems like the right move for sure! Good call. We can also perhaps have a few utilities to easily assign labels before adding proper support for mutations, like (just a sketch, I don't feel strongly about this API in particular)

from grandcypher import assign_labels

g_with_labels = assign_labels(g, assignments)

where assignments can be a dict or callable:

assignments = {"a": "Customer", "b": "Store", "c": "Product"}
# or:
assignments = lambda x: x.split(":")[1:] # for node IDs of format "Jordan:Customer"

One thing to keep in mind is that objects can have more than one entity label assigned... So __label__ may have to be a complex dtype like set instead of a simple str.

[EDIT] Went back and changed "types" to "labels" to match cypher terminology.

Fixed by @khoale88 in #25 :):)