wmv/appengine-ndb-experiment

Changing Parent / Ancestor

Closed this issue · 5 comments

I know that once you set the parent/ancestor you can't change it because it 
implies changes to all the objects key's all down the tree.

Your design of consistency based on the parent makes a lot of sense, it 
simplifies a lot what has to be made consistent, and allows a great deal of 
optimizations/simplifications that I'm guessing are the reason behind it.

The problem rises when for some reason the parent/ancestor has to changed, that 
implies the recreation of all the objects and that cost might pay off 
considering the benefits.

Of course, this only makes sense when you aren't expected to change the parent. 
but that might have to happen. By design we are supposed to use another parent 
and place the real parent in some property, but that breaks all the benefits. 
When its a simple tree, the recreation of objects is trivial, but sometimes its 
not.

So it would be nice if that feature could be supported at the database level, 
as it could optimize that recreation a lot and depending on the implementation 
it could handle even more difficult things like references.

For instance, if at the database level instead of serializing the full key you 
use a binary id, then all you have to change is the label of that binary id.

I opened the issue originally at 
http://code.google.com/p/googleappengine/issues/detail?id=8045 but I was told 
to open it here.

I'm not sure it will hold, but any improvement beyond "use another parent" or 
"do it yourself" is a must.

The problem is how much overhead you can avoid at the driver level, if you 
can't, I'm guessing that most serious use cases will become too heavy, over the 
transaction's limits.

Even if you can't use it because the graph is too big, it would always be very 
helpful to update smaller graphs.

Original issue reported on code.google.com by spi...@gmail.com on 6 Sep 2012 at 1:26

Hm...  Supporting this in the underlying Datastore is out of the question so 
we're now debating whether we can do this in userland in NDB.

I think if you just want to move an entity from parent A to parent B, it's easy 
to do it yourself:

def reparent(oldkey, newparent):
  newkey = ndb.Key(oldkey.kind(), oldkey.id(), parent=newparent)
  ent = oldkey.get()
  ent._key = newkey
  ent.put()
  oldkey.delete()

However there are still several problems with this.

1. If there already is an entity with key == newkey, it will be overwritten.
2. If oldkey.id() is numeric, we ought to add a reservation for the same id in 
the new parent using alloc_id().
3. We ought to use a transaction; this has to be an XP transaction because two 
entity groups are affected.
4. Validation: oldkey.parent() != newparent.
5. Check that ent is not None.

I'm not sure what other functionality you are asking for -- Do you want to move 
multiple entities? Move intermediate parents along as well? Please help design 
a possible API by specifying your requirements.

Original comment by guido@google.com on 11 Sep 2012 at 2:53

The problem wasn't in changing the entity's parent, as you could also recreate 
them, the question is more about the children and if there is a more efficient 
way that you can do it, under the hood.

Regarding my kind of functionality, lets consider the example:
global -> user -> blogs -> posts -> comments » user

Say that for some reason you had two users that now you want to merge in a new 
one.
That implies two kinds of changes:
1. update all the child blogs to have as new parent the other user
2. update all the references in all comments made by that user

AFAIK if you update the parent of an entity you have to update also the keys of 
all the children recursively.

CASE 1. That would imply traveling down the graph.

I don't think it could/should be done in a single transaction, perhaps for some 
simple cases you can do it all at once in a single transaction.

I would use some kind of iteration, perhaps a task that gets two keys and moves 
all the children from one to the other, in a depth first way. Like a search 
algorithm.

CASE 2. I don't know if there is a way to get all the references to a key and 
to update them also. But I would also do it iteratively in terms of 
transactions, getting one by one and updating an entity at a time.

Regarding the points you ask:
1. 2. 
Well, I was thinking more in recreating the entire key in the moving entity, as 
there could be some parent specifics information.
And then just re-parenting all its children, as there, local key's part should 
remain valid, as the local context is basically the same.

3. Well, this kind of changes without transactions are always very dangerous.

Regarding moving multiple entities, I don't mind moving one at a time, the only 
question is the children.

Your last question, moving intermediate parents, I'm not sure what you mean, 
perhaps, you are talking about the children. If that's the case, i see it as a 
recursion, like I said above.

I hope I understood all your questions and that I was useful.

Count me in to give you all the help I can.

Original comment by spi...@gmail.com on 11 Sep 2012 at 5:20

Alas, there is no way to change those keys without copying all the entities.  
There is also no way to find all the references to a key, unless you know that 
all those references occur in a specific kind at a specific property -- then 
you can just query for those.

So I do not think there is a shortcut to do what you are asking -- one way or 
another all the entities whose key changes must be copied (and the old ones 
deleted).

I propose to close this issue as won't-fix, since this is really not something 
we want to encourage, and there is nothing we can do that you can't already do 
just as well yourself.

Original comment by guido@google.com on 11 Sep 2012 at 5:42

I agree, I just hopped that under the hood you had less overhead.

As this being something bad to encourage, I believe that having a global entity 
group and a parent property that you can change easily is a design that will 
imply a big performance penalty and the loss of the local consistency concept.

I believe that with this design all the nice things of the local consistency 
concept are lost.

Sometimes the requirements say "may" and you have to support it, no matter how 
unlikely it is to happen.

Feel free to close it, for me its something really important missing in that 
beautiful consistency design, because it would allow you to update the graph in 
a very elegant and consistent way, being always inside that local consistency.

But I understand, the database wasn't designed for that and its not that simple.

Too bad that under the hood its not just like SQL where you can update directly 
all the references with a query for each table and do it in a single 
transaction, but yes, in SQL you only had to change the parent.

Original comment by spi...@gmail.com on 11 Sep 2012 at 6:22

If you want to argue about the Datastore's underlying design, the NDB tracker 
is *definitely* not the right forum.

Original comment by guido@google.com on 11 Sep 2012 at 6:26

  • Changed state: WontFix