citusdata/citus

Updating reference table (assertion) crashes at ruleutils when updating columns in different order

Opened this issue · 7 comments

CREATE TABLE update_test (
    a   INT DEFAULT 10,
    b   INT,
    c   TEXT
);
SELECT create_reference_table('update_test');
UPDATE update_test
SET (b,a) = (select a,b from update_test where b = 41 and c = 'car')
WHERE a = 100 AND b = 20;

Above update crashes due to assertion failure with below backtrace:

#2  0x0000000000aae380 in ExceptionalCondition (conditionName=0x7f5ffa87d4b8 "!(((Param *) expr)->paramid == ((cur_ma_sublink->subLinkId << 16) | 1))",
    errorType=0x7f5ffa87ca81 "FailedAssertion", fileName=0x7f5ffa87caf1 "deparser/ruleutils_12.c", lineNumber=3411) at assert.c:54
#3  0x00007f5ffa7d1572 in get_update_query_targetlist_def (query=0x1ded318, targetList=0x1dee078, context=0x7ffeb0977d90, rte=0x1ded490)
    at deparser/ruleutils_12.c:3410
#4  0x00007f5ffa7d11c1 in get_update_query_def (query=0x1ded318, context=0x7ffeb0977d90) at deparser/ruleutils_12.c:3277
#5  0x00007f5ffa7ce668 in get_query_def_extended (query=0x1ded318, buf=0x1df0198, parentnamespace=0x0, distrelid=0, shardid=0, resultDesc=0x0,
    prettyFlags=0, wrapColumn=0, startIndent=0) at deparser/ruleutils_12.c:1968
#6  0x00007f5ffa7ce48c in get_query_def (query=0x1ded318, buf=0x1df0198, parentnamespace=0x0, resultDesc=0x0, prettyFlags=0, wrapColumn=0, startIndent=0)
    at deparser/ruleutils_12.c:1907
#7  0x00007f5ffa7cb2d4 in pg_get_query_def (query=0x1ded318, buffer=0x1df0198) at deparser/ruleutils_12.c:457
#8  0x00007f5ffa8099db in DeparseTaskQuery (task=0x1ded1e0, query=0x1ded318) at planner/deparse_shard_query.c:499
#9  0x00007f5ffa809b20 in TaskQueryString (task=0x1ded1e0) at planner/deparse_shard_query.c:564
#10 0x00007f5ffa7e833d in ExecuteLocalTaskListExtended (taskList=0x1deffe8, orig_paramListInfo=0x0, distributedPlan=0x1deafe8, defaultTupleDest=0x1defd78,
    isUtilityCommand=false) at executor/local_executor.c:281
#11 0x00007f5ffa7dcd23 in RunLocalExecution (scanState=0x1ddcb10, execution=0x1defdb0) at executor/adaptive_executor.c:825
#12 0x00007f5ffa7dcb52 in AdaptiveExecutor (scanState=0x1ddcb10) at executor/adaptive_executor.c:751
#13 0x00007f5ffa7e1f53 in CitusExecScan (node=0x1ddcb10) at executor/citus_custom_scan.c:224
#14 0x0000000000732cc9 in ExecCustomScan (pstate=0x1ddcb10) at nodeCustom.c:116
#15 0x000000000071a5d6 in ExecProcNodeFirst (node=0x1ddcb10) at execProcnode.c:445
#16 0x000000000070f414 in ExecProcNode (node=0x1ddcb10) at ../../../src/include/executor/executor.h:239
#17 0x0000000000711c4f in ExecutePlan (estate=0x1ddc888, planstate=0x1ddcb10, use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false,
    numberTuples=0, direction=ForwardScanDirection, dest=0x1dc1d58, execute_once=true) at execMain.c:1646
#18 0x000000000070fa51 in standard_ExecutorRun (queryDesc=0x1ddc478, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364
#19 0x00007f5ffa7ea6e7 in CitusExecutorRun (queryDesc=0x1ddc478, direction=ForwardScanDirection, count=0, execute_once=true)
    at executor/multi_executor.c:202
#20 0x000000000070f874 in ExecutorRun (queryDesc=0x1ddc478, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:306
#21 0x0000000000923abc in ProcessQuery (plan=0x1dc1cc0,
    sourceText=0x1c1fae8 "UPDATE update_test\nSET (b,a) = (select a,b from update_test where b = 41 and c = 'car')\nWHERE a = 100 AND b = 20;", params=0x0,
    queryEnv=0x0, dest=0x1dc1d58, completionTag=0x7ffeb0978690 "") at pquery.c:161
#22 0x00000000009254f0 in PortalRunMulti (portal=0x1cf87d8, isTopLevel=true, setHoldSnapshot=false, dest=0x1dc1d58, altdest=0x1dc1d58,
    completionTag=0x7ffeb0978690 "") at pquery.c:1283
#23 0x0000000000924a4d in PortalRun (portal=0x1cf87d8, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1dc1d58, altdest=0x1dc1d58,
    completionTag=0x7ffeb0978690 "") at pquery.c:796
#24 0x000000000091e5b4 in exec_simple_query (
    query_string=0x1c1fae8 "UPDATE update_test\nSET (b,a) = (select a,b from update_test where b = 41 and c = 'car')\nWHERE a = 100 AND b = 20;")
    at postgres.c:1215
#25 0x0000000000922a7c in PostgresMain (argc=1, argv=0x1cb9020, dbname=0x1cb8e60 "postgres", username=0x1cb8e38 "postgres") at postgres.c:4247
#26 0x00000000008713bd in BackendRun (port=0x1c3b9f0) at postmaster.c:4448
#27 0x0000000000870b06 in BackendStartup (port=0x1c3b9f0) at postmaster.c:4139
#28 0x000000000086cc46 in ServerLoop () at postmaster.c:1704
#29 0x000000000086c46d in PostmasterMain (argc=3, argv=0x1c1a760) at postmaster.c:1377
#30 0x00000000007850f7 in main (argc=3, argv=0x1c1a760) at main.c:228

And if we disable assertions, we get no errors nor warnings.

	cur_ma_sublink = (SubLink *) lfirst(next_ma_cell);
	next_ma_cell = lnext(ma_sublinks, next_ma_cell);
	remaining_ma_columns = count_nonjunk_tlist_entries(
				((Query *) cur_ma_sublink->subselect)->targetList);
	Assert(((Param *) expr)->paramid == ((cur_ma_sublink->subLinkId << 16) | 1));
	appendStringInfoChar(buf, '(');

-> ruleutils_13.c
targetList's first element has paramid=65538 whereas the second element has paramid=65537. Here, we assert the first one's paramid=65537, then it fails because the order of a and b reversed.,
Here we only check the first element of the targetList. However, what we are looking is at the second place.

This seems like an unnecessary assertion. How about removing it? @onurctirtir

Those functions are copied from postgres so I think we should not do that, there should be a reason for that if it's not a postgres bug.

Instead, we should understand why those a,b are changed or how can we fit to that function

targetList's first element has paramid=65538 whereas the second element has paramid=65537

In the failing one, we assert that paramid == (x << 16) | 1 which will always be an odd number, regardless of the value of x. Therefore, when we change the order of the input as SET(b, a), it will never satisfy this condition. Hence, we should either remove this assertion, or sort the content of the Query object to get 65537 into first place, instead of 65538. The first option sounds better to me.

Why postgres doesn't hit to that assertion ?

  1. Did they update the code and they are not hitting to that assert ?
  2. Or they also use the same code, but:
    a) They are doing some preprocessing to fit to that assertion (like sorting the content of the query as you said)
    b) Or the way that they use this function already fits to the assumption made in that Assert ?

In general, I'm less inclined to update such functions that we copied from postgres since one day we might re-sync those functions with postgres and we might forget what we did.

So, if it's a) or b), I think we need to fit to that Assert by reading the code in postgres that uses that function and maybe we should do some preprocessing in this function or in the caller ones, or we can introduce a wrapper for that and use it in callers.

I think removing that assertion is the last option in case that we can't find a way to fit to that Assert. In that case, before deciding to remove that Assert, we need to make sure that rest of the code can really handle the cases where the statement in this Assert evaluates to false

Postgres doesn't actually check this assertion. The function get_update_query_targetlist_def isn't called for the example you provided.

Yeah, postgres wouldn't call this function for update commands but would do when deparsing rule objects doing update as a result.
So I think we should actually create a rule object (with create rule command) that does the same update on a table and we should debug a bit to see why postgres doesn't hit to that assertion.

CREATE TABLE update_test (
    a   INT DEFAULT 10,
    b   INT,
    c   TEXT
);

CREATE RULE rule1  AS ON INSERT TO update_test DO  UPDATE update_test SET (b, a) = ( SELECT a, b
           FROM update_test WHERE b = 41 AND c = 'car')
  WHERE a = 100 AND b = 20

\d update_test

This calls the crashing function for plain pg. However, when pg fetches the rule data before deparsing and printing out, the TargetEntry list in Query object comes with paramid = 65537 first, 65538 second. This one really surprised me. Because in the output, a and b comes in the reverse order.

We should either

  • Remove the failing assertion
  • Modify the query object before, maybe sorting by paramid
  • Maybe another solution in the parser?