yugabyte/yugabyte-db

[YSQL] Investigate suspicious behavior with PARTITION BY HASH

Closed this issue · 2 comments

Jira Link: DB-11910

Description

Context

From Alan on slack,

Found some really bizarre behavior regarding PARTITION BY HASH:

Given a table: test_txn created like the following:
CREATE TABLE test_txn (id serial not null, ts timestamp not null default now(), hour_bucket smallint, primary key(id hash, ts desc, hour_bucket)) partition by hash(hour_bucket);
CREATE TABLE test_txn_even PARTITION OF test_txn FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE test_txn_odd PARTITION OF test_txn FOR VALUES WITH (MODULUS 2, REMAINDER 1);

If I INSERT 11 rows into the table like so:

INSERT INTO test_txn (hour_bucket) VALUES(0); ...

I get a very curious distribution:

test=# select * from test_txn_even;
 id |             ts             | hour_bucket
----+----------------------------+-------------
  1 | 2024-06-04 15:49:24.45136  |           0
  2 | 2024-06-04 15:49:27.988179 |           1
  3 | 2024-06-04 15:49:30.131668 |           2
test=# select * from test_txn_odd;
 id |             ts             | hour_bucket
----+----------------------------+-------------
  5 | 2024-06-04 15:49:34.429385 |           4
 11 | 2024-06-04 15:49:50.876112 |          10
  6 | 2024-06-04 15:49:36.956534 |           5
  7 | 2024-06-04 15:49:39.908184 |           6
  9 | 2024-06-04 15:49:45.333032 |           8
 10 | 2024-06-04 15:49:48.469496 |           9
  4 | 2024-06-04 15:49:32.211855 |           3
  8 | 2024-06-04 15:49:42.895187 |           7
(8 rows)

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.

Tried a similar set of commands on pg15 as well

CREATE TABLE test_txn (id serial not null, ts timestamp not null default now(), hour_bucket smallint, primary key(id, ts, hour_bucket)) partition by hash(hour_bucket);
CREATE TABLE test_txn_even PARTITION OF test_txn FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE test_txn_odd PARTITION OF test_txn FOR VALUES WITH (MODULUS 2, REMAINDER 1);

Insert 0 to 10 ...

INSERT INTO test_txn (hour_bucket) select generate_series(0, 10);

The even entries remain the same

testdb=# select * from test_txn_even;
 id |            ts             | hour_bucket
----+---------------------------+-------------
  1 | 2024-06-25 19:34:57.92473 |           0
  2 | 2024-06-25 19:34:57.92473 |           1
  3 | 2024-06-25 19:34:57.92473 |           2

indicating that the behavior is the same even in PostgreSQL.

The above behavior is simply a coincidence.

For example,

insert into test_txn select generate_series(-1000, 1000);

inserts random data into even and odd partitions.

yugabyte=# select count(*) from test_txn_odd;
 count 
-------
   960
(1 row)

yugabyte=# select count(*) from test_txn_even;
 count 
-------
  1041
(1 row)

What hash function is used for partitioning?

Hint: it is not hashint2.
Short answer: hash_combine64(0, hashint2extended(<hour_bucket>, 8816678312871386365)

Details:

  1. In execPartition.c, compute_partition_hash_value is invoked to compute hash value of the tuple to help decide the partition.
  2. In partbounds.c, compute_partition_hash_value inturn invokes hash2extended(key, seed) and hashcombine64s the result with 0.