DataDog/pg_tracing

[Bug] Segmentation fault in single-copy mode

Closed this issue · 2 comments

Environment details (Operating System, PostgreSQL version, pg_tracing version, etc)

  • PostgreSQL version: pg16 via apt-get
  • pg_tracing version: master branch
  • Operating System: ubuntu 22.04

Steps to reproduce the issue

  1. create table and load some data.
CREATE TABLE tenk1 (
	unique1		int4,
	unique2		int4,
	two			int4,
	four		int4,
	ten			int4,
	twenty		int4,
	hundred		int4,
	thousand	int4,
	twothousand	int4,
	fivethous	int4,
	tenthous	int4,
	odd			int4,
	even		int4,
	stringu1	name,
	stringu2	name,
	string4		name
);

copy tenk1 from '/users/yulai/tenk.data'; 

CREATE INDEX tenk1_unique1 ON tenk1 USING btree(unique1 int4_ops);

data comes from https://raw.githubusercontent.com/postgres/postgres/master/src/test/regress/data/tenk.data

  1. create extension pg_tracing;
  2. SET debug_parallel_query = 1; -- single-copy mode
  3. /*traceparent='00-00000000000000000000000000000123-0000000000000123-01'*/ select stringu1 from tenk1 where unique1 = 1;

Describe the results you received

server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

Describe the results you expected

Additional information you deem important (e.g. issue happens only occasionally)

In the log:

2024-09-05 19:53:53.026 MDT [13597] LOG:  server process (PID 13627) was terminated by signal 11: Segmentation fault
2024-09-05 19:53:53.026 MDT [13597] DETAIL:  Failed process was running: /*traceparent='00-00000000000000000000000000000123-0000000000000123-01'*/ select stringu1 from tenk1 where unique1 = 1;
explain (costs off)
  select stringu1 from tenk1 where unique1 = 1;
                  QUERY PLAN                   
-----------------------------------------------
 Gather
   Workers Planned: 1
   Single Copy: true
   ->  Index Scan using tenk1_unique1 on tenk1
         Index Cond: (unique1 = 1)
(5 rows)

@bonnefoa I guess this bug is because pg_tracing expects the leader process to generate planstate spans, but the leader only waits for the worker to finish and does not perform any processing in the single-copy mode.

Nice catch! The core issue is that debug_parallel_query will run the parallel query without the leader's participation. Thus, the leader won't execute the nodes and won't go through ExecProcNodeFirstPgTracing which is where the traced_planstates are created. You will have the same issue by having a parallel query with parallel_leader_participation disabled.
I will push a fix for that.