Adding a "/" at the end can reduce the amount of "OR" queries

Question

Adding a "/" at the end can reduce the amount of "OR" queries

KaviiSuri opened this issue 3 years ago · 4 comments

It is pretty common in my experience to get all descendants of a record in the tree.
Currently, a descendant of a record can have "ancestry" column of the following forms:

"{id}"
"{id}/{another_id}/.."
Because of this behaviour, the query to get descendants of id 3 is

SELECT "{table_name}".* FROM "{table_name}" WHERE ("{table_name}"."ancestry" LIKE '3/%' OR "{table_name}"."ancestry" = '3')

The "OR" in the query seems unnecessary and can be easily removed if we just add "/" at the end of ancestry always, effectively changing the forms to

"{id}/"
"{id}/{another_id}/.."
This way, the query is just

SELECT "{table_name}".* FROM "{table_name}" WHERE "{table_name}"."ancestry" LIKE '3/%'

Is there a specific reason ancestry doesn't do this and has decided to not have a "/" at the end of single ancestors?

Answer 1 · 2022-01-04T17:46:31.000Z

Outstanding resources

The research materials/presentations for tree structures in a number of programming languages all had the same implementation. And that implementation without the trailing slash looks obviously wrong.

Implementation

I have been pulling the materialized path implementation into a separate class so we could support multiple tree implementations: traditional materialized path, the trailing slash, postgres arrays, ltree, path with the id in it.

Check out: https://github.com/kbrock/ancestry/tree/materialized_path2
I just rebased so it should be in a usable state.

Results

When I ran the numbers for trailing slash it was not any faster.

Issues I see:

we need to stop using unicode encoding for the column.
We still have way too many OR clauses
We do not have good data sets to show if we actually have improved anything.

Future steps

While I totally agree that the current implementation that everyone uses is probably wrong, I would like to actually know which actions/queries use indexes and which table scan.

So I keep getting derailed by wanting a test suite to actually test this stuff out.

Answer 2 · 2022-02-11T18:04:05.000Z

@kbrock Were the results using the pg array approach better? In fact, I was looking into this not because of speed, but because having a trailing and maybe a leading slash would make implementing custom scopes a much cleaner and easier task.

Answer 3 · 2022-02-25T16:04:05.000Z

@dimvic Yes. I too want a way to determine the root_id, parent_id, ancestor_ids, and child_ancestry values in pure sql without resorting to using a CASE. The current implementation sure leans towards wanting to use ruby to parse.

I want to use a serializer to parse/create the ancestry value. If we do that, then the value of ancestry would be an array - basically the same as ancestor_ids. It would remove a lot of complexity. But I don't know how much this breaks the code that uses ancestry. I know in our code, there are some assumptions that ancestry is a string with a / delimiter.

Very tempted to breaking this functionality in the next major version of ancestry. But I'm hesitant too

Answer 4 · 2022-06-10T03:37:59.000Z

Let me know if you end up trying out the strategy of materialized_path2 and if it works alright for you.