anyone could give clarity to the readme?

Question

anyone could give clarity to the readme?

laptopmutia opened this issue 2 years ago · 6 comments

add-ancestry-column-to-your-table

Depending upon your comfort with databases, you may want to create the column with C or POSIX encoding. This is a more primitive encoding and just compares bytes. Since this column will just contains numbers and slashes, it works much better. It also works better for the uuid case as well.

Alternatively, if you create a text_pattern_ops index for your postgresql column, subtree selection will use an efficient index for you regardless of whether you created the column with POSIX encoding.

If you opt out of this, and are trying to run tests on postgres, you may need to set the environment variable COLLATE_SYMBOLS=false. Sorry to say that a discussion on this topic is out of scope. The important take away is postgres sort order is not consistent across operating systems but other databases do not have this same issue.

I got confused by this

I'm using postgresql on LINUX should I change my database table into using C or POSIX instead en_US.UTF-8 ?

Answer 1 · 2022-11-07T13:31:13.000Z

I found this PR though https://github.com/stefankroes/ancestry/pull/543/files

I think I could just ignore the C and POSIX but adding COLLATE_SYMBOLS=false when I testing is that true?

Answer 2 · 2022-11-07T20:21:55.000Z

hello @laptopmutia

The basic premise of materialized path is to look for models via "table"."ancestry" LIKE "/1/2/%".
The index starts on the left and goes to the right, so it is able to lookup "all children" very easily since they are consecutive in the index.

This wonderful multiple byte character world has thrown a wrench into postgres's plans. Some collations (read: sort order) ignore symbols. so /1/2/3/4 is treated like 1234. and /12/34 is treated the same. Uppercase and lowercase can be sorted the same. This is a very different world from simply looking at the bytes and saying character number 65 is after character number 64.

The suggestion is to tell postgres to tread this column like a simple primitive byte string.

The COLLATE_SYMBOLS=false statement is for testing only. It tells the testing component that the collation is ignoring symbols and it will find that /12/32 will be returned from sql before /1/2/3/4.

Answer 3 · 2022-11-07T22:15:41.000Z

I am currently testing with :encoding => 'LATIN1'. It seems to run fine in postgres and mysql so we can see where it goes from there

Answer 4 · 2023-01-27T15:04:46.000Z

please ping me if you have any more questions

Answer 5 · 2023-02-16T18:22:07.000Z

@laptopmutia I have revisited this and tweaked the readme in #601

Answer 6 · 2023-02-26T10:14:45.000Z

@laptopmutia I have revisited this and tweaked the readme in #601

that is really make it clear for me thank you