background worker "powa" (PID 2536) was terminated by signal 11: Segmentation fault
aleszeleny opened this issue · 7 comments
Hello,
we are experiencing in the last few weeks. Unfortunately haven't isolate reliable test case yet, so I can provide only observations available.
Symptoms are the same - issue subject error message in PostgreSQL log (and therefore server restart).
The common workaround is to remove powa
from shared_preload_libraries
, once this is done, the failing job proceeded - this worked for all cases where we suffered from segfault issue.
OS: Ubuntu 18.04.2 LTS
Postgres versions where the issue occurs:
-
PostgreSQL 11.4 (Ubuntu 11.4-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
- Powa version:
postgresql-11-powa/now 3.2.0-1.pgdg18.04+2 amd64
- Powa version:
-
PostgreSQL 11.8 (Ubuntu 11.8-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
postgresql-11-powa/bionic-pgdg,now 4.0.1-1.pgdg18.04+1 amd64
shared_preload_libraries
: Same on servers/versions listed above - pg_stat_statements,pg_stat_kcache,powa,pg_qualstats,pg_partman_bgw,pg_cron,pg_prewarm
As soon as powa
was removed from shared_preload_libraries
the job proceeded.
One case was a function creating and dropping thousand of temporary tables (once I spot this, the developer was asked to refactor the code), the second issue was from a java based ETL tools performing upsert (several hundred thousands of rows - I have no access to the java ETL details to say whether there were some batches or a transaction per row).
It never happened immediately, always after a few minutes of the workload runs.
I there something I can test or setup better logging to identify the issue root cause or any workaround to avoid the segfaults while having powa in place?
Thanks Ales
Hello,
Is there any chance that this started after a package upgrade, more precisely having pg_qualstats 2.0.0 or 2.0.1 installed, server restarted, but the pg_qualstats extension not being dropped and created (so still in version 1.X.Y)?
If yes, you're hitting powa-team/pg_qualstats#30. This has been fixed and released, although I'm not sure when a debian package will be available.
In the meantime, dropping and creating pg_qualstats extension will also fix the problem.
Thanks for the tip, I'll try to find a better reproducible test and reinstall pg_qualstats.
After one of our two tests, it seems it helped. Once I'll have a while will try to run the second test case as well.
As usual, it seems to be my user error, because once you point the potential version issue, I've checked qualstats extension and binaries and they were different. The problem was not caused by pgdg ubuntu packages, we have deployed new server using ansible-playbook and it installed qualstats in latest version while on production DB server is an older version and database was restored from a physical backup. That is the way, how old extension version landed into the database wile more recent binaries were installed on OS level.
Thank for the confirmation! Well, pg_qualstats should have provided backward compatibility with v1.x version and properly detected mismatch with future version, but I failed to do that in v2.0.0 and v2.0.1, it's definitely not a user error.
For the record debian packages for v2.0.2 are now available on pgdg repository, see https://www.postgresql.org/message-id/E1jgoIm-00027Z-4y@atalia.postgresql.org.
I've run all remaining know test-cases causing segfault and once database extension was re/created so it matches binaries from the installed OS-level package all test succeed.
Thanks a lot for Your help!
P.S.: thanks for the new pg_qualstats package and for powa4 packages, I like the remote mode option, but I did not want to use home compiled extensions with the one form pgdg packages, on prod, so except some testing, I was waiting for the version 4. Great work!
Thank you very much, I'm glad that you like this new remote mode option!