neo4j-contrib/neomodel

Handling multiple labels where class inheritance is not appropriate

alanbuxton opened this issue · 10 comments

This was initially raised as a question in the community forum: https://community.neo4j.com/t/how-to-handle-3-or-more-labels-in-neomodel/63767

Full details below:

I've got a Neo4J database where a node can have multiple labels.

As a toy example, imagine these nodes:

CREATE 
(a:Parent {name: "Foo"}), 
(b:Child1:Parent {name: "Bar"}), 
(c:Child2:Parent {name:"Baz"}), 
(d:Blue:Child1:Parent {name:"Boz"}), 
(e:Orange:Child1:Parent {name:"Qwe"}), 
(f:Blue:Child2:Parent {name:"Asd"}), 
(g:Orange:Child2:Parent {name: "Zxc"})

In neomodel I can define the following classes:

class Parent(StructuredNode):
    name = StringProperty()

class Child1(Parent):
    pass

class Child2(Parent):
    pass

Using this I can do things like:

Child1.nodes.filter(name="Bar")

and get a result because the only matching node happens to have Child1 and Parent1 labels. But if I do Child1.nodes.all() the results also include nodes that are labelled with Blue and Orange so I get the following error:

neomodel.exceptions.NodeClassNotDefined: Node with labels Parent,Blue,Child1 does not resolve to any of the known objects
Parent --> <class '__main__.Parent'>
Parent,Child1 --> <class '__main__.Child1'>
Parent,Child2 --> <class '__main__.Child2'>

I can't simply do:

class BlueChild1(Child1, Parent, Blue):
    pass 

class OrangeChild1(Child1, Parent, Orange):
    pass

etc because in these cases I'd end up with a requirement for BlueChild1 and OrangeChild1 as labels, which is not what I want in the database.

This may sound like a contrived example but I'm seeing it in a real use case where a node could be an Organization or a Person or both (or even have some other labels), but it's not the case that Person or Organization inherit from each other.

I investigated using __optional_labels__ but this is still problematic because the class name is always created as a label.

I've implemented a solution that works for my use case here: alanbuxton@cb51ba0

Happy to open a PR if you think helpful for the project

This is interesting. I'll have to dive in more to fully understand the difference between this and using __abstract_node__ first, but otherwise it can be interesting I think.

On the other hand, my worry would be that it can get confusing between abstract_node, optional_labels, inheritance ?

If there is a neater solution using __abstract_node__ then that would be super cool, but my issue was in this kind of multiple inheritance situation which didn't seem a good fit.

@alanbuxton A key (low level) detail in neomodel is that the labels are used to determine the class that a Node is "paired" with.

Consequently, the labels are determined by the modelling choices. Neomodel returns the correct set of results because given the class hierarchy you described, Blue and Orange ARE ChildX objects too (where X \in {1,2})

I would like to suggest two things:

  1. Can you please post a more realistic example of what corresponds to what given my explanation above? You mention "a node can have mulitple labels". Indeed, in Neomodel a node can have multiple labels but the labels have a specific meaning. Is your inheritance example mirroring how you have modelled your domain exactly or how you would like the labels to work?

  2. Specifically to your problem, I would suggest that you first determine which are the labels that should be used to determine the instantiated class (that is, the set of labels that uniquely identify an entity in your model) and which labels are secondary (that is, they exist for convenience or "speed" or whatever other need they cover in their particular application). You can then begin to manage, which classes would need renaming within your model and which classes would need additional labels defined using one of the available mechanisms.

    1. If your example reflects your modelling hierarchy, that is, if you expect Blue to inherit methods from both Parent::Child1 and Parent::Child2 then it would be interesting to see how we could work from both sides to find a "best practice" for this challenge within Neomodel. But before we start thinking about alternative object resolution algorithms, let's first make sure we understand what we are dealing with.

The data I've got in Neo4J was loaded from RDF via neosemantics. This tool gives all nodes a "Resource" label and can then create additional labels depending on their RDF "type". Each unique entity can have multiple types in RDF, and so can get multiple labels when loaded into Neo4J.

In some rare cases, a node is treated as both a Person and an Organization. In some others I've got labels for both a SiteAddedActivity (meaning an organization opened in a new geographical location) and a ProductAddedActivity (meaning an organization has launched a new product).

Here is a more realistic example showing the combined labels.

from neomodel import db, StructuredNode, StringProperty

cypher_creation = '''CREATE
(a:Resource:Person {name:"foo"}),
(b:Resource:Organization {name:"bar"}),
(c:Resource:Person:Organization {name:"baz"}),
(d:Resource:SiteAddedActivity {name:"qwe"}),
(e:Resource:ProductAddedActivity {name:"asd"}),
(f:Resource:ProductAddedActivity:SiteAddedActivity {name: "zxc"})
'''

class Resource(StructuredNode):
    name = StringProperty()

class Person(Resource):
    pass

class Organization(Resource):
    pass

class SiteAddedActivity(Resource):
    pass

class ProductAddedActivity(Resource):
    pass

db.cypher_query(cypher_creation)

With this data the following both work:

Person.nodes.get_or_none(name="foo")

and

Resource.nodes.get_or_none(name="foo")

both return:

<Person: {'name': 'foo', 'element_id_property': '4:7a3471f8-2e7f-4f99-8866-ffd9e0752d2a:36858'}>

But

Person.nodes.get_or_none(name="baz")

throws a neomodel.exceptions.NodeClassNotDefined: Node with labels Resource,Organization,Person does not resolve to any of the known objects

This is super clear !

You mentioned in your first post that

I investigated using optional_labels but this is still problematic because the class name is always created as a label.

Here do you mean that your tested solution was something like this :

class Organization(Resource):
    __optional_labels__ = ["Person"]
    pass

... and the "class name that is always created as a label" is Organization, which you don't want ? I'm not sure I understand why this doesn't fit the purpose, so if you could explain :-)

Thanks for getting back so quickly @mariusconjeaud - really appreciate you looking into this.

In this solution, every time something that has both Organization and Person labels assigned it will be treated as an Organization. It means I can't have one model with logic related to a "Person" and a different model with logic related to an "Organization"., e.g. obviously I can't do:

class Organization(Resource):
    __optional_labels__ = ['Person']

class Person(Resource):
    __optional_labels__ = ['Organization']

Because the combo of "Organization" and "Person" is already taken

Really I want to be able to handle these cases. For a given node:

  • This node is a person (with its own business logic in python)
  • This node is an organization (with its own business logic in python)
  • This node is both an organization and a person

Thoughts?

Any updates on this issue at all?

I think the real solution would be if Organization.nodes.all() would know to be expanding cypher results to Organization objects rather than figuring out what class to insantiate based only on the labels.

Hey @alanbuxton, sorry for letting that die down, I had less time to spend on neomodel recently, and most of it has been spent on the driver refactoring + py2neo becoming EOL for real - you'll hear more about this later.

I now clearly understand what you want, but will admit to being lost for now, as I haven't worked on the cypher projection to objects yet so I know little about it.

So to sump up, I still think this is important, I just did not have time, and it is not a trivial fix for me (yet)

Not sure but think this is a related issue and wondering if it's a bug. In core.py the registry is created with this code;

def build_class_registry(cls):
base_label_set = frozenset(cls.inherited_labels())
optional_label_set = set(cls.inherited_optional_labels())

# Construct all possible combinations of labels + optional labels
possible_label_combinations = [
    frozenset(set(x).union(base_label_set))
    for i in range(1, len(optional_label_set) + 1)
    for x in combinations(optional_label_set, i)
]
possible_label_combinations.append(base_label_set)

for label_set in possible_label_combinations:
    if label_set not in db._NODE_CLASS_REGISTRY:
        db._NODE_CLASS_REGISTRY[label_set] = cls
    else:
        raise NodeClassAlreadyDefined(cls, db._NODE_CLASS_REGISTRY)

But if you look at it, it always doing a union. So in my case, I have 2 inherited classes...and as a result it's creating an index that looks like {"A","B"}, not {"A"}, {"B"} and {"A","B"} which is what I thought it was going to do.

Then here in util.py is where its used:

   if isinstance(object_to_resolve, Node):
        return self._NODE_CLASS_REGISTRY[
            frozenset(object_to_resolve.labels)
        ].inflate(object_to_resolve)

The issue is that is always receiving just one of the classes, the one that was used to create the query that needs to be inflated. So either {"A"} or {"B"}, but not {"A", "B"} and the call fails with cannot resolve error because neither {"A"} or {"B"} is in the _NODE_CLASS_REGISTRY. It seems to be happening with all my inherits.

Am I missing something? Was the intention to produce a list of all possible combinations, because I don't see how that code above accomplishes that, and as far as I see in my runs, frozen set only contains the multiple class index's.

Thanks.