Can't start database "PG-Strom fatbin image is not valid now"
Opened this issue · 17 comments
Discussed in #742
Originally posted by alefcs23 March 22, 2024
2024-03-22 13:43:43 -03 [85441]: [6-1] user=,db=,app=,client=LOG: PG-Strom fatbin image is not valid now, so rebuild in progress...
sh: 1: Syntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
sh: 1: sh: 1: Syntax error: Bad fd numberSyntax error: Bad fd number
sh: 1: Syntax error: Bad fd number
2024-03-22 13:43:43 -03 [85441]: [7-1] user=,db=,app=,client=FATAL: failed on the build process at [/tmp/.pgstrom_fatbin_build_bQbEj8]
2024-03-22 13:43:43 -03 [85441]: [8-1] user=,db=,app=,client=LOG: database system is shut down
pg_ctl: could not start server
Examine the log output.
What shell program is launched by the user who works PostgreSQL server process?
It launches nvcc
using system(3)
function, so we expect /bin/bash
is available.
just double checked and it uses /bin/bash, is there any way that i could specify the shell or edit the syntax myself?
The __rebuild_gpu_fatbin_file()
function in src/gpu_device.c
construct command lines.
Can you try to print cmd.data
using elog(LOG, ...)
?
sry idk how to, gonna look into __rebuild_gpu_fatbin_file() and see what happens
I met the same problem too.
I met the same problem too.
what SO are you using?
cuda 12.0 ubuntu 22.04 PostgreSQL 16 PG-Strom5
I wish one day, I can use PG-Strom on Ubuntu smoothly
sudo apt-get install pg-strom-PG16
like TimescaleDB or pg-vector.
The
__rebuild_gpu_fatbin_file()
function insrc/gpu_device.c
construct command lines. Can you try to printcmd.data
usingelog(LOG, ...)
?
if i change to a rpm based OS, will my problem be solved?
commit 593da4ec873e8096f11b6bbc0ff2aa3194edd29d
will fix the problem.
sh: 1: Syntax error: Bad fd number
It is a typical error message when we run a command and redirect both of stdout and stderr into one file using:
% COMMAND >& logfile
But it was bash enhancement, not available at sh or tcsh.
So, PG-Strom's code-builder routine now build a shell command to kick nvcc using the manner:
% COMMAND > logfile 2>&1
cuda 12.0 ubuntu 22.04 PostgreSQL 16 PG-Strom5 I wish one day, I can use PG-Strom on Ubuntu smoothly
sudo apt-get install pg-strom-PG16
like TimescaleDB or pg-vector.
Oh.. Need CUDA 12.2 or Latter...
commit
593da4ec873e8096f11b6bbc0ff2aa3194edd29d
will fix the problem.sh: 1: Syntax error: Bad fd number
It is a typical error message when we run a command and redirect both of stdout and stderr into one file using:
% COMMAND >& logfile
But it was bash enhancement, not available at sh or tcsh. So, PG-Strom's code-builder routine now build a shell command to kick nvcc using the manner:
% COMMAND > logfile 2>&1
the cluster now starts but when running a query this pops up:
2024-04-13 15:36:14.252 -03 [60537] LOG: PG-Strom fatbin image is not valid now, so rebuild in progress...
2024-04-13 15:36:14.252 -03 [60537] LOG: rebuild fatbin command: cd '/tmp/.pgstrom_fatbin_build_X0Lgmg' && ( /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o xpu_common.o /usr/share/postgresql/16/pg_strom/xpu_common.cu' > xpu_common.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o cuda_gpuscan.o /usr/share/postgresql/16/pg_strom/cuda_gpuscan.cu' > cuda_gpuscan.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o cuda_gpujoin.o /usr/share/postgresql/16/pg_strom/cuda_gpujoin.cu' > cuda_gpujoin.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o cuda_gpupreagg.o /usr/share/postgresql/16/pg_strom/cuda_gpupreagg.cu' > cuda_gpupreagg.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o xpu_basetype.o /usr/share/postgresql/16/pg_strom/xpu_basetype.cu' > xpu_basetype.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o xpu_numeric.o /usr/share/postgresql/16/pg_strom/xpu_numeric.cu' > xpu_numeric.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o xpu_timelib.o /usr/share/postgresql/16/pg_strom/xpu_timelib.cu' > xpu_timelib.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o xpu_textlib.o /usr/share/postgresql/16/pg_strom/xpu_textlib.cu' > xpu_textlib.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o xpu_misclib.o /usr/share/postgresql/16/pg_strom/xpu_misclib.cu' > xpu_misclib.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o xpu_jsonlib.o /usr/share/postgresql/16/pg_strom/xpu_jsonlib.cu' > xpu_jsonlib.log 2>&1 & /bin/sh -x -c '/usr/local/cuda/bin/nvcc --maxrregcount=128 --source-in-ptx -lineinfo -I. -I/usr/include/postgresql/16/server -DHAVE_FLOAT2 -arch=native --threads 4 --device-c -o xpu_postgis.o /usr/share/postgresql/16/pg_strom/xpu_postgis.cu' > xpu_postgis.log 2>&1) && wait; /bin/sh -x -c '/usr/local/cuda/bin/nvcc -Xnvlink --suppress-stack-size-warning -arch=native --threads 4 --device-link --fatbin -o 'pgstrom-gpucode-V012040-ff9a6c27933d7a7d7e539ebd9b2ab4a0.fatbin' xpu_common.o cuda_gpuscan.o cuda_gpujoin.o cuda_gpupreagg.o xpu_basetype.o xpu_numeric.o xpu_timelib.o xpu_textlib.o xpu_misclib.o xpu_jsonlib.o xpu_postgis.o' > pgstrom-gpucode-V012040-ff9a6c27933d7a7d7e539ebd9b2ab4a0.fatbin.log 2>&1
That is updated revision's expected behavior.
The background worker process (GPU Service) kicks nvcc
, then PG-Strom functionality shall be available once fatbin (GPU binary) image becomes ready.
problem is it is never ready (just loops rebuild) and GPU does not show any signs of activity during this
You may see compilation error logs in $PGDATA/.pgstrom_fatbin/
.
Or, /tmp/.pgstrom_fatbin_build_X0Lgmg
according to your logs.
got it,
every log there ends with the "gcc: No such file or directory", but gcc seems to be in path and working, any tips?
Is it really visible from PostgreSQL server process? Please check it.