stefankroes/ancestry

arrange that discards orphans

brendon opened this issue · 6 comments

Hi @kbrock,

I was wondering if there was appetite for altering the arrange method to allow for an option to drop orphans instead of promoting them to root nodes?

The use case is for when we select a part of a tree, and then also have conditions that remove some intermediate nodes but not their descendants.

It is that code that allows sorting in ruby - that is the main issue. It adds ambiguity so the sql query can not properly sort and the tree can not properly come back with pagination. I would like to ask the community what I can change about that feature but know of no good way how to ask the community for what features they need.

Since it is difficult to know if the missing nodes are from the previous page, from filtering a where(), or some other reason, it doesn't seem possible to just drop nodes without a parent node. Also feel like we were able to get most of the correct values when using a sql sort.

Are you thinking about adding a flag that says to drop nodes without a parent? I think we could keep a similar algorithm to do that - but it would not work for pagination or where() clauses or many other use cases.

I tried to distinguish the behavior in the specs where the wrong thing happens but it is the best we can do. Also tried to document it so it was clear to all that I did not like the results but again, it was the best that we could do.

Yes it's certainly a hairy method. I get confused every time I try to read it and understand how it works. Perhaps it's quite efficient, but yes it seems like it's difficult to determine what is a real root node, and what is coming from the subtree. I don't think it's impossible though.

The idea would be check if any of the ancestor_ids on a node are in the list of node_ids, if they are, but the parent_id isn't in the list of node_ids then we know this is an orphan. We can then pass a flag in to decide what to do with those (make then root nodes, or just discard them).

I honestly can't imagine a scenario where pagination with an arranged set of nodes would even make sense. You certainly couldn't ever use a where() clause in that scenario. Wouldn't the results be erratic?

ok, so you are focused on this line?

arranged[node] = children unless node_ids.include?(node.parent_id)

It says that this is a root if this node is not a child of a node already represented in another tree in this result set.
I think you are asking to add the node to arranged only if node.parent_id.nil?


Really need to find a way to properly sort nodes based upon a numerical form of the ancestor ids. Well, that and join on those ids rather than splitting them up in ruby before they are usable.

I tend to be more postgres centric - so the array or the ltree datatypes interest me, but I'm sure mysql and others have an equally interesting way to expose that data in a reasonable way. Going the join table route like closure_trees is one solution, but there are other ways of having a sql accessible mechanism for those keys stored in the primary table.


Also this whole iso/internationalization effort has rendered the indexes on the field useless in many ways. For our purposes, an old school cstring array will index and search much easier than a unicode friendly field. This gem was definitely designed before iso was taken into account.

Sounds like you're having an existential crisis! :D

I definitely think there's room here for a new library that does both parentage and ordering within, and does it well. On the ordering front, being the maintainer of acts_as_list and also ranked_model, I feel like there must be a better way. Currently I'm leaning toward a ranked_model approach but with a defined expected scope size (e.g. ranked things within a particular scope generally number around 100 or 1000 etc... This lets us be smarter when calculating gaps between position integers.

Anywho, with regards to the arrange method, is there a way of representing a tree structure as a hash with missing tree nodes? One way would be to make up the hash using ids as the keys and a hash like this as the value:

5: { node: <TheNode>, children: {} }

We can do this based on the ancestry column which tells us the ancestor ids of an item even if those nodes don't exist in the data set we have. We will be able to tell if we've reached a gap in the tree when parsing the hash when we encounter a node with a null value for node:. We can then decide to still walk the children, or just skip this depending on our requirements of our use case.

What do you think?

Sounds like you're having an existential crisis! :D

yup

The code only looks at parent_id - which I agree with you, is pretty naive. We've been working with this type of data for over a decade. While I feel our views on the importance of viewing partial/paginated trees are different, we both feel that there is a lost opportunity here.

I agree that the interface for displaying a second page for a tree is not that straight forward. In this case, simply displaying ... for missing nodes makes sense.

But as you joked before, I am considering gutting most of this and going a more native rails friendly route. Implementing a serialization routine to add slashes and remove them at the model layer may have made sense in rails 2 but it is getting in my way. Just having trouble knowing that everyone expects ancestry to hold the before serialization value, when hashes and money and other structures serialized don't act that way at all. Hopefully I'll get down off this ledge before I go too far and rewrite the whole thing from scratch.

Hehe, yea avoid that green field. You'll never get out!

I think there's probably a decent upgrade path where you keep your old serialisation logic, but work in options for storing the materialised ancestry as a jsonb column for instance. You could even have an upgrade script that converts over when people are ready.

An idea for finding out what people need is to add a post-install message to the gem pointing people at some kind of discussion forum where you lay out your proposals for the future. Then give it a few months to see what comes out of the woodwork.