heterodb/pg-strom

arrow_fdwをGPU処理時、WHERE句で大小比較を行うとGPUサービスプロセス内部でsegmentation fault発生

Closed this issue · 4 comments

SELECT * FROM timezone_arrow WHERE id > 100;

全体クエリ

DROP TABLE IF EXISTS timezone_data;
DROP FOREIGN TABLE IF EXISTS timezone_arrow;
CREATE TABLE timezone_data (
  id     int,
  ts     timestamp
);
INSERT INTO timezone_data (
  SELECT x, pgstrom.random_timestamp(1, '2020-02-11 12:00:00',
                                        '2020-02-13 12:00:00')
    FROM generate_series(1,1500) x);

\! ../arrow-tools/pg2arrow -d postgres -c 'SELECT * FROM timezone_data' -o /tmp/timezone_data.arrow

IMPORT FOREIGN SCHEMA timezone_arrow
  FROM SERVER arrow_fdw
  INTO public
OPTIONS (file '/tmp/timezone_data.arrow');

SET arrow_fdw.enabled = off;
--  以下のどちらのパターンでもGPUサービスプロセスが死亡
SELECT * FROM timezone_arrow WHERE id > 100;
SELECT * FROM timezone_arrow WHERE ts < '2020-02-12 12:00:00';

サーバプロセスメッセージ

2023-12-23 16:19:27.574 JST [553390] LOG:  background worker "PG-Strom GPU Service" (PID 553808) was terminated by signal 11: Segmentation fault
2023-12-23 16:19:27.574 JST [553390] LOG:  terminating any other active server processes
2023-12-23 16:19:27.583 JST [553390] LOG:  all server processes terminated; reinitializing
2023-12-23 16:19:27.778 JST [554174] LOG:  database system was interrupted; last known up at 2023-12-23 16:17:20 JST
2023-12-23 16:19:27.778 JST [554177] FATAL:  the database system is in recovery mode
2023-12-23 16:19:27.965 JST [554174] LOG:  database system was not properly shut down; automatic recovery in progress
2023-12-23 16:19:28.000 JST [554174] LOG:  redo starts at 0/2FA08F08
2023-12-23 16:19:28.013 JST [554174] LOG:  invalid record length at 0/2FA43168: wanted 24, got 0
2023-12-23 16:19:28.013 JST [554174] LOG:  redo done at 0/2FA43110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2023-12-23 16:19:28.075 JST [554175] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-12-23 16:19:28.417 JST [554175] LOG:  checkpoint complete: wrote 39 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.128 s, sync=0.114 s, total=0.376 s; sync files=27, longest=0.090 s, average=0.005 s; distance=232 kB, estimate=232 kB
2023-12-23 16:19:28.456 JST [553390] LOG:  database system is ready to accept connections
2023-12-23 16:19:28.473 JST [554189] ERROR:  failed on bind('.pg_strom.553390.gpu0.sock'): Address already in use
2023-12-23 16:19:28.492 JST [553390] LOG:  background worker "PG-Strom GPU Service" (PID 554189) exited with exit code 1
2023-12-23 16:19:33.752 JST [554231] LOG:  GPU0 workers - 13 startup (with GpuCacheManager), 0 terminate

gdbバックトレース(バックエンドプロセス)

Thread 1 "postgres" received signal SIGQUIT, Quit.
0x00007f9506889e87 in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f9506889e87 in epoll_wait () from /lib64/libc.so.6
#1  0x00000000007a5c0a in WaitEventSetWait ()
#2  0x00000000007a5f85 in WaitLatch ()
#3  0x00007f95062733ef in __waitAndFetchNextXpuCommand ()
   from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#4  0x00007f9506273a59 in pgstromExecScanAccess.part.11 ()
   from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#5  0x00007f95062759f6 in pgstromExecTaskState ()
   from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#6  0x000000000067336b in standard_ExecutorRun ()
#7  0x00000000007c886e in PortalRunSelect ()
#8  0x00000000007c9ac7 in PortalRun ()
#9  0x00000000007c62bf in exec_simple_query ()
#10 0x00000000007c6f28 in PostgresMain ()
#11 0x000000000075012b in ServerLoop ()
#12 0x0000000000751004 in PostmasterMain ()
#13 0x00000000004efe89 in main ()

GPUサービスプロセスのgdbトレース

Thread 9 "postgres" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9f7f7fe700 (LWP 553824)]
0x00007fa2d7516a84 in pthread_mutex_lock () from /lib64/libpthread.so.0

こちら、手元環境だと heterodb-extra モジュールがエラーを返していますね…。

/tmp だとGPUDirect Storageの対象外からなのか。

hoge=# SELECT * FROM timezone_arrow WHERE ts < '2020-02-12 12:00:00';
2023-12-23 20:45:07.876 JST [1169292] ERROR:  gpudirect.c:716  extra-module: failed on pread: Success
2023-12-23 20:45:07.876 JST [1169292] HINT:  device at GPU-0, function at __gpudirect__read_iov_fallback
2023-12-23 20:45:07.876 JST [1169292] STATEMENT:  SELECT * FROM timezone_arrow WHERE ts < '2020-02-12 12:00:00';
ERROR:  gpudirect.c:716  extra-module: failed on pread: Success
HINT:  device at GPU-0, function at __gpudirect__read_iov_fallback

magroで試すとこんな感じっぽいですね。

確かに /home はSATA HDDなのでGPUDirectのopen fileからvfs_fallbackに落ちるというのは分かりみ。
そこの実行パス、あまり叩いてないから、extraモジュール側の何かかー。

(gdb) bt
#0  0x00007fcae7940a84 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007fcae4a4f126 in __gpudirect__open_file () from /lib64/heterodb_extra.so
#2  0x00007fcae4a51643 in vfs_fallback.read_file_iov () from /lib64/heterodb_extra.so
#3  0x00007fcae68ec5b4 in gpuDirectFileReadIOV () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#4  0x00007fcae690617d in __gpuservLoadKdsCommon () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#5  0x00007fcae69084b0 in gpuservHandleGpuTaskExec () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#6  0x00007fcae6909020 in gpuservGpuWorkerMain () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#7  0x00007fcae793e1ca in start_thread () from /lib64/libpthread.so.0
#8  0x00007fcae6e1ce73 in clone () from /lib64/libc.so.6

heterodb-extra-5.2-3.el8.x86_64.rpm で修正しました。

内部的にhash表を参照するときのインデックス計算ミスでした。
なぜ顕在化しなかったかというと、buriではデバイスのメジャー番号/マイナー番号のペアがたまたまhash表をオーバーフローしないパターンだったぽいです(恐怖

修正確認できました。ありがとうございました。