Flaky single_node_enterprise test
Green-Chan opened this issue · 2 comments
I have Ubuntu 24.04, 13.2.0, PostgreSQL REL_16_3 configured with options CFLAGS=" -Og" --enable-tap-tests --enable-debug --with-openssl --with-libxml --enable-cassert --with-icu --with-lz4
, Citus main (9e1852e).
First of all, when running make check-enterprise
, I get an assertion failure (see #7591). So I comment that assertion:
--- a/src/backend/distributed/deparser/ruleutils_16.c
+++ b/src/backend/distributed/deparser/ruleutils_16.c
@@ -1589,7 +1589,7 @@ set_join_column_names(deparse_namespace *dpns, RangeTblEntry *rte,
if (colinfo->is_new_col[col_index])
i++;
}
- Assert(i == colinfo->num_cols);
+ //Assert(i == colinfo->num_cols);
Assert(j == nnewcolumns);
#endif
Then I change enterprise_schedule
so it runs lots of single_node_enterprise tests:
test: single_node_enterprise
test: single_node_enterprise
test: single_node_enterprise
test: single_node_enterprise
test: single_node_enterprise
test: single_node_enterprise
test: single_node_enterprise
test: single_node_enterprise
test: single_node_enterprise
test: single_node_enterprise
Then I run make check-enterprise
and some of these tests fail with diff
--- /home/test/citus/src/test/regress/expected/single_node_enterprise.out.modified 2024-08-15 07:12:28.263667388 +0000
+++ /home/test/citus/src/test/regress/results/single_node_enterprise.out.modified 2024-08-15 07:12:28.275667634 +0000
@@ -465,28 +465,30 @@
NOTICE: issuing /*{"cId":10,"tId":"101"}*/INSERT INTO single_node_ent.test_90730501 (x, y) VALUES (101, 100)
INSERT INTO test(x,y) VALUES (102,100);
NOTICE: issuing /*{"cId":10,"tId":"102"}*/INSERT INTO single_node_ent.test_90730502 (x, y) VALUES (102, 100)
-- followed by a multi-shard command
SELECT count(*) FROM test;
NOTICE: issuing SELECT count(*) AS count FROM single_node_ent.test_90730501 test WHERE true
NOTICE: issuing SELECT count(*) AS count FROM single_node_ent.test_90730502 test WHERE true
NOTICE: issuing SELECT count(*) AS count FROM single_node_ent.test_90730503 test WHERE true
NOTICE: issuing SELECT count(*) AS count FROM single_node_ent.test_90731504 test WHERE true
NOTICE: issuing SELECT count(*) AS count FROM single_node_ent.test_90731505 test WHERE true
+NOTICE: issuing BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT assign_distributed_transaction_id(0, 251, '2024-08-15 00:12:28.099834-07');
NOTICE: issuing SELECT count(*) AS count FROM single_node_ent.test_90731506 test WHERE true
count
-------
53
(1 row)
ROLLBACK;
NOTICE: issuing ROLLBACK
+NOTICE: issuing ROLLBACK
-- should fail as only read access is allowed
SET ROLE read_access_single_node;
INSERT INTO test VALUES (1, 1, (95, 'citus9.5')::new_type);
ERROR: permission denied for table test
SET ROLE postgres;
\c
SET search_path TO single_node_ent;
-- Cleanup
RESET citus.log_remote_commands;
SET client_min_messages TO WARNING;
I was able to reproduce the issue in the devcontainer environment.
When querying a view that was created before altering the schema of the underlying tables (specifically, after adding a new column), the server crashes with an assertion failure in the set_join_column_names
function in ruleutils_16.c
.
Here’s a link to the relevant test case:
These are the columns related to the view:
SELECT * FROM (test JOIN colocated_table USING (x)) foo(x, y, z)
LEFT JOIN ref ON foo.x = ref.a;
x | y | z | y | z | a | b
---------------------------------------------------------------------
Here’s the relevant part of the server log:
LOG: colinfo->num_cols: 6, i: 7, j: 7, nnewcolumns: 7
TRAP: failed Assert("i == colinfo->num_cols"), File: "deparser/ruleutils_16.c", Line: 1614
colinfo->num_cols
represents the number of columns in the join at the time the view was created (which is 6).- After adding a new column
z
tocolocated_table
, the actual number of columns becomes 7. - When the deparser tries to reconstruct the view, it expects
i
(the index of processed columns) to matchcolinfo->num_cols
. - Since
i
increments to 7 due to the new column, butcolinfo->num_cols
remains at 6, the assertionAssert(i == colinfo->num_cols)
fails, causing the crash.
The deparser doesn't seem to handle schema changes in underlying tables that affect views.
After recreating the view, the assertion no longer fails.
CREATE OR REPLACE VIEW view_created_before_shard_moves AS
SELECT count(*) AS count
FROM (test JOIN colocated_table USING (x)) AS foo
LEFT JOIN ref ON (foo.x = ref.a);