[Bug] Segmentation fault in single-copy mode
Closed this issue · 2 comments
Environment details (Operating System, PostgreSQL version, pg_tracing version, etc)
- PostgreSQL version: pg16 via apt-get
- pg_tracing version: master branch
- Operating System: ubuntu 22.04
Steps to reproduce the issue
- create table and load some data.
CREATE TABLE tenk1 (
unique1 int4,
unique2 int4,
two int4,
four int4,
ten int4,
twenty int4,
hundred int4,
thousand int4,
twothousand int4,
fivethous int4,
tenthous int4,
odd int4,
even int4,
stringu1 name,
stringu2 name,
string4 name
);
copy tenk1 from '/users/yulai/tenk.data';
CREATE INDEX tenk1_unique1 ON tenk1 USING btree(unique1 int4_ops);
data comes from https://raw.githubusercontent.com/postgres/postgres/master/src/test/regress/data/tenk.data
- create extension pg_tracing;
- SET debug_parallel_query = 1; -- single-copy mode
/*traceparent='00-00000000000000000000000000000123-0000000000000123-01'*/ select stringu1 from tenk1 where unique1 = 1;
Describe the results you received
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Describe the results you expected
Additional information you deem important (e.g. issue happens only occasionally)
In the log:
2024-09-05 19:53:53.026 MDT [13597] LOG: server process (PID 13627) was terminated by signal 11: Segmentation fault
2024-09-05 19:53:53.026 MDT [13597] DETAIL: Failed process was running: /*traceparent='00-00000000000000000000000000000123-0000000000000123-01'*/ select stringu1 from tenk1 where unique1 = 1;
explain (costs off)
select stringu1 from tenk1 where unique1 = 1;
QUERY PLAN
-----------------------------------------------
Gather
Workers Planned: 1
Single Copy: true
-> Index Scan using tenk1_unique1 on tenk1
Index Cond: (unique1 = 1)
(5 rows)
@bonnefoa I guess this bug is because pg_tracing expects the leader process to generate planstate spans, but the leader only waits for the worker to finish and does not perform any processing in the single-copy mode.
Nice catch! The core issue is that debug_parallel_query
will run the parallel query without the leader's participation. Thus, the leader won't execute the nodes and won't go through ExecProcNodeFirstPgTracing which is where the traced_planstates are created. You will have the same issue by having a parallel query with parallel_leader_participation
disabled.
I will push a fix for that.