arrow_fdwをGPU処理時、WHERE句で大小比較を行うとGPUサービスプロセス内部でsegmentation fault発生
Closed this issue · 4 comments
0-kaz commented
SELECT * FROM timezone_arrow WHERE id > 100;
全体クエリ
DROP TABLE IF EXISTS timezone_data;
DROP FOREIGN TABLE IF EXISTS timezone_arrow;
CREATE TABLE timezone_data (
id int,
ts timestamp
);
INSERT INTO timezone_data (
SELECT x, pgstrom.random_timestamp(1, '2020-02-11 12:00:00',
'2020-02-13 12:00:00')
FROM generate_series(1,1500) x);
\! ../arrow-tools/pg2arrow -d postgres -c 'SELECT * FROM timezone_data' -o /tmp/timezone_data.arrow
IMPORT FOREIGN SCHEMA timezone_arrow
FROM SERVER arrow_fdw
INTO public
OPTIONS (file '/tmp/timezone_data.arrow');
SET arrow_fdw.enabled = off;
-- 以下のどちらのパターンでもGPUサービスプロセスが死亡
SELECT * FROM timezone_arrow WHERE id > 100;
SELECT * FROM timezone_arrow WHERE ts < '2020-02-12 12:00:00';
サーバプロセスメッセージ
2023-12-23 16:19:27.574 JST [553390] LOG: background worker "PG-Strom GPU Service" (PID 553808) was terminated by signal 11: Segmentation fault
2023-12-23 16:19:27.574 JST [553390] LOG: terminating any other active server processes
2023-12-23 16:19:27.583 JST [553390] LOG: all server processes terminated; reinitializing
2023-12-23 16:19:27.778 JST [554174] LOG: database system was interrupted; last known up at 2023-12-23 16:17:20 JST
2023-12-23 16:19:27.778 JST [554177] FATAL: the database system is in recovery mode
2023-12-23 16:19:27.965 JST [554174] LOG: database system was not properly shut down; automatic recovery in progress
2023-12-23 16:19:28.000 JST [554174] LOG: redo starts at 0/2FA08F08
2023-12-23 16:19:28.013 JST [554174] LOG: invalid record length at 0/2FA43168: wanted 24, got 0
2023-12-23 16:19:28.013 JST [554174] LOG: redo done at 0/2FA43110 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s
2023-12-23 16:19:28.075 JST [554175] LOG: checkpoint starting: end-of-recovery immediate wait
2023-12-23 16:19:28.417 JST [554175] LOG: checkpoint complete: wrote 39 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.128 s, sync=0.114 s, total=0.376 s; sync files=27, longest=0.090 s, average=0.005 s; distance=232 kB, estimate=232 kB
2023-12-23 16:19:28.456 JST [553390] LOG: database system is ready to accept connections
2023-12-23 16:19:28.473 JST [554189] ERROR: failed on bind('.pg_strom.553390.gpu0.sock'): Address already in use
2023-12-23 16:19:28.492 JST [553390] LOG: background worker "PG-Strom GPU Service" (PID 554189) exited with exit code 1
2023-12-23 16:19:33.752 JST [554231] LOG: GPU0 workers - 13 startup (with GpuCacheManager), 0 terminate
gdbバックトレース(バックエンドプロセス)
Thread 1 "postgres" received signal SIGQUIT, Quit.
0x00007f9506889e87 in epoll_wait () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f9506889e87 in epoll_wait () from /lib64/libc.so.6
#1 0x00000000007a5c0a in WaitEventSetWait ()
#2 0x00000000007a5f85 in WaitLatch ()
#3 0x00007f95062733ef in __waitAndFetchNextXpuCommand ()
from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#4 0x00007f9506273a59 in pgstromExecScanAccess.part.11 ()
from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#5 0x00007f95062759f6 in pgstromExecTaskState ()
from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#6 0x000000000067336b in standard_ExecutorRun ()
#7 0x00000000007c886e in PortalRunSelect ()
#8 0x00000000007c9ac7 in PortalRun ()
#9 0x00000000007c62bf in exec_simple_query ()
#10 0x00000000007c6f28 in PostgresMain ()
#11 0x000000000075012b in ServerLoop ()
#12 0x0000000000751004 in PostmasterMain ()
#13 0x00000000004efe89 in main ()
GPUサービスプロセスのgdbトレース
Thread 9 "postgres" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9f7f7fe700 (LWP 553824)]
0x00007fa2d7516a84 in pthread_mutex_lock () from /lib64/libpthread.so.0
kaigai commented
こちら、手元環境だと heterodb-extra モジュールがエラーを返していますね…。
/tmp
だとGPUDirect Storageの対象外からなのか。
hoge=# SELECT * FROM timezone_arrow WHERE ts < '2020-02-12 12:00:00';
2023-12-23 20:45:07.876 JST [1169292] ERROR: gpudirect.c:716 extra-module: failed on pread: Success
2023-12-23 20:45:07.876 JST [1169292] HINT: device at GPU-0, function at __gpudirect__read_iov_fallback
2023-12-23 20:45:07.876 JST [1169292] STATEMENT: SELECT * FROM timezone_arrow WHERE ts < '2020-02-12 12:00:00';
ERROR: gpudirect.c:716 extra-module: failed on pread: Success
HINT: device at GPU-0, function at __gpudirect__read_iov_fallback
kaigai commented
magroで試すとこんな感じっぽいですね。
確かに /home はSATA HDDなのでGPUDirectのopen fileからvfs_fallbackに落ちるというのは分かりみ。
そこの実行パス、あまり叩いてないから、extraモジュール側の何かかー。
(gdb) bt
#0 0x00007fcae7940a84 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1 0x00007fcae4a4f126 in __gpudirect__open_file () from /lib64/heterodb_extra.so
#2 0x00007fcae4a51643 in vfs_fallback.read_file_iov () from /lib64/heterodb_extra.so
#3 0x00007fcae68ec5b4 in gpuDirectFileReadIOV () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#4 0x00007fcae690617d in __gpuservLoadKdsCommon () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#5 0x00007fcae69084b0 in gpuservHandleGpuTaskExec () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#6 0x00007fcae6909020 in gpuservGpuWorkerMain () from /home/onishi/pgbin152/lib/postgresql/pg_strom.so
#7 0x00007fcae793e1ca in start_thread () from /lib64/libpthread.so.0
#8 0x00007fcae6e1ce73 in clone () from /lib64/libc.so.6
kaigai commented
heterodb-extra-5.2-3.el8.x86_64.rpm
で修正しました。
内部的にhash表を参照するときのインデックス計算ミスでした。
なぜ顕在化しなかったかというと、buriではデバイスのメジャー番号/マイナー番号のペアがたまたまhash表をオーバーフローしないパターンだったぽいです(恐怖
0-kaz commented
修正確認できました。ありがとうございました。