heterodb/pg-strom

CPUFallback・サーバプロセスでSegmentation Fault発生

Closed this issue · 2 comments

以下のクエリ実行時CPU fallback実施するメッセージが表示された後にSegmentation fault発生します。

SET pg_strom.enabled = on;
SELECT *
  FROM fallback_data d NATURAL JOIN fallback_enlarge l
 WHERE l.aid < 2500 AND memo LIKE '%ab%';	-- Error

出力メッセージ

postgres=# SELECT *
postgres-#   FROM fallback_data d NATURAL JOIN fallback_enlarge l
postgres-#  WHERE l.aid < 2500 AND memo LIKE '%ab%';-- Error
NOTICE:  (xpu_textlib.h:56) CPU fallback due to text datum is compressed or external [xpu_text_is_valid]

バックトレース

(gdb) c
Continuing.
[New Thread 0x7f9808179700 (LWP 1542924)]
[Thread 0x7f9808179700 (LWP 1542924) exited]
[New Thread 0x7f9808179700 (LWP 1542926)]
[Thread 0x7f9808179700 (LWP 1542926) exited]
[New Thread 0x7f9808179700 (LWP 1542928)]
[Thread 0x7f9808179700 (LWP 1542928) exited]
[New Thread 0x7f9808179700 (LWP 1542930)]
[Thread 0x7f9808179700 (LWP 1542930) exited]
[New Thread 0x7f9808179700 (LWP 1542952)]
[New Thread 0x7f9807978700 (LWP 1542954)]

Thread 1 "postgres" received signal SIGSEGV, Segmentation fault.
0x00007f9abfc5aa84 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f9abfc5aa84 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007f9abec14d1f in xpuClientPutResponse () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#2  0x00007f9abec14db9 in pgstromExecScanAccess.part.11 ()
   from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#3  0x00007f9abec16f56 in pgstromExecTaskState () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#4  0x000000000067336b in standard_ExecutorRun ()
#5  0x00000000007c886e in PortalRunSelect ()
#6  0x00000000007c9ac7 in PortalRun ()
#7  0x00000000007c62bf in exec_simple_query ()
#8  0x00000000007c6f28 in PostgresMain ()
#9  0x000000000075012b in ServerLoop ()
#10 0x0000000000751004 in PostmasterMain ()
#11 0x00000000004efe89 in main ()
(gdb) 

全体クエリ

---
--- Test for CPU fallback and GPU kernel suspend / resume on PostgreSQL table
---
SET pg_strom.regression_test_mode = on;

-- this test uses pre-built test table
SET search_path = pg_temp,pgstrom_regress,public;

-- disables SeqScan and kernel source
SET enable_seqscan = off;
SET max_parallel_workers_per_gather = 0;

-- prepare table
-- test for CPU fallback / GPU kernel suspend/resume
CREATE TABLE fallback_data (
  id    int,
  aid   int,
  cat   text,
  x     float,
  y     float,
  memo  text
);
SELECT pgstrom.random_setseed(20190714);
INSERT INTO fallback_data (
  SELECT x, pgstrom.random_int(0.5, 1, 4000),
            CASE floor(random()*26)
            WHEN 0 THEN 'aaa'
            WHEN  1 THEN 'bbb'
            WHEN  2 THEN 'ccc'
            WHEN  3 THEN 'ddd'
            WHEN  4 THEN 'eee'
            WHEN  5 THEN 'fff'
            WHEN  6 THEN 'ggg'
            WHEN  7 THEN 'hhh'
            WHEN  8 THEN 'iii'
            WHEN  9 THEN 'jjj'
            WHEN 10 THEN 'kkk'
            WHEN 11 THEN 'lll'
            WHEN 12 THEN 'mmm'
            WHEN 13 THEN 'nnn'
            WHEN 14 THEN 'ooo'
            WHEN 15 THEN 'ppp'
            WHEN 16 THEN 'qqq'
            WHEN 17 THEN 'rrr'
            WHEN 18 THEN 'sss'
            WHEN 19 THEN 'ttt'
            WHEN 20 THEN 'uuu'
            WHEN 21 THEN 'vvv'
            WHEN 22 THEN 'www'
            WHEN 23 THEN 'xxx'
            WHEN 24 THEN 'yyy'
            ELSE 'zzz'
            END,
            pgstrom.random_float(2,-1000.0,1000.0),
            pgstrom.random_float(2,-1000.0,1000.0),
            pgstrom.random_text_len(2, 200)
    FROM generate_series(1,400001) x);
UPDATE fallback_data
   SET memo = md5(memo) || md5(memo)
 WHERE id = 400001;
UPDATE fallback_data
   SET memo = memo || '-' || memo || '-' || memo || '-' || memo
 WHERE id = 400001;
UPDATE fallback_data
   SET memo = memo || '-' || memo || '-' || memo || '-' || memo
 WHERE id = 400001;
UPDATE fallback_data
   SET memo = memo || '-' || memo || '-' || memo || '-' || memo
 WHERE id = 400001;

CREATE TABLE fallback_enlarge (
  aid   int,
  z     float,
  md5   char(200)
);
INSERT INTO fallback_enlarge (
  SELECT x / 5, pgstrom.random_float(2,-1000.0,1000.0),
            md5(x::text)
    FROM generate_series(1,20000) x);


-- GpuJoin with GPU kernel suspend / resume, and CPU fallback
SET pg_strom.enabled = on;
SELECT *
  FROM fallback_data d NATURAL JOIN fallback_enlarge l
 WHERE l.aid < 2500 AND memo LIKE '%ab%';	-- Error

20b0ea73ee50712a54dbf25acb46aa3f7e4af9f4 で修正。

GpuJoinの終了時に、RIGHT OUTERの終了処理が必要ない場合にダミーのコマンド(成功ステータスを持つ)を ready_list の末尾に付けるという処理を行うのですが、これの xcmd->priv がNULLなので、本来はXpuConnectionを参照できるはずがSEGVを喰らってしまったという事のようでした。

確認できました。ご対応ありがとうございました。