[Bug]: Stopping columnar vacuum can result in table corruption
muntdan opened this issue · 10 comments
What's wrong?
I was testing Updating a row and than running the columnar.vacuum_full().
Because it was taking to long I stopped that one and run a more precise columnar.vacuum for the specific partitioned table.
Something happened and I got the error:
ERROR: attempt to read columnar data of length 8168 from offset 24 of block 3927 of relation 29101 CONTEXT: SQL statement "SELECT columnar.vacuum(tablename, stripe_count)" PL/pgSQL function columnar.vacuum_full(name,real,integer) line 25 at SQL statement
If I now run a select on that table I get this error:
ERROR: Unknown frame descriptorzstd decompression failed ERROR: zstd decompression failed SQL state: XX000 Detail: Unknown frame descriptor
How did you stop the vacuum?
There's a number of things the code does to catch signals to try to avoid this, and I don't know what that stop button actually does... but yes the columnar.vacuum
UDFs can cause corruption if "hard interrupted." ^C and pg_cancel_backend should be fine, but I suspect this used pg_terminate_backend and I am guessing that's not handled, and I'm not sure it can be handled. There's potentially more we could do here to make it more durable but would require a bunch more work.
Best workaround at the moment is "don't do that."
I was able to confirm:
- pgAdmin 4 7.7 uses
pg_cancel_backend
underneath that button. - we do have signal handlers for SIGINT, which is used by
pg_cancel_backend
, and SIGTERM, which is used bypg_terminate_backend
. - Using either function did not cause corruption when I tested it.
That said, there might be a bug in the signal handlers themselves. My tests were relatively simple, so there could have been something specific about your database.
My tests used the latest hydra docker image (pg14).
If there's any other information we can use to try to reproduce the issue, please let us know.
Hi.
Indeed I use pgAdmin 4 7.7.
I use your docker image for PG 15.
What might be the big difference is the partitioning:
I have the table partitioned first level: 10 Tenant partition by List.
And than each of those partitions has a 52 partitions for each week of the year by range.
While I have a total of 520 partitions, there is data only for 2 months => 10 Tenants x 9 weeks ~ 90 partitions have data.
FWIW: I faced the same problem, but I don't stop the vacuum manually. I use pg_cron
to schedule columnar.vacuum
function. The job run details as below:
jobid | runid | job_pid | database | username | command | status | return_message | start_time | end_time |
---|---|---|---|---|---|---|---|---|---|
1 | 42 | 37348 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:58:31.687 +0800 | 2023-10-26 02:58:31.697 +0800 |
1 | 41 | 37298 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:58:01.686 +0800 | 2023-10-26 02:58:01.696 +0800 |
1 | 40 | 37250 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:57:31.686 +0800 | 2023-10-26 02:57:31.696 +0800 |
1 | 39 | 37200 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:57:01.687 +0800 | 2023-10-26 02:57:01.697 +0800 |
1 | 38 | 37152 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:56:31.686 +0800 | 2023-10-26 02:56:31.700 +0800 |
1 | 37 | 37103 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:56:01.686 +0800 | 2023-10-26 02:56:01.696 +0800 |
1 | 36 | 37055 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:55:31.685 +0800 | 2023-10-26 02:55:31.695 +0800 |
1 | 35 | 37006 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:55:01.685 +0800 | 2023-10-26 02:55:01.695 +0800 |
1 | 34 | 36958 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: zstd decompression failed DETAIL: Unknown frame descriptor | 2023-10-26 02:54:31.684 +0800 | 2023-10-26 02:54:31.694 +0800 |
1 | 33 | 36909 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | failed | ERROR: attempt to read columnar data of length 557 from offset 309 of block 148527 of relation 19333 | 2023-10-26 02:54:01.684 +0800 | 2023-10-26 02:54:01.889 +0800 |
1 | 32 | 36860 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:53:31.684 +0800 | 2023-10-26 02:53:32.121 +0800 |
1 | 31 | 36763 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:52:36.725 +0800 | 2023-10-26 02:53:31.678 +0800 |
1 | 30 | 36583 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:50:49.683 +0800 | 2023-10-26 02:52:36.713 +0800 |
1 | 29 | 36012 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:44:57.255 +0800 | 2023-10-26 02:50:49.674 +0800 |
1 | 28 | 35470 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:39:23.372 +0800 | 2023-10-26 02:44:57.249 +0800 |
1 | 27 | 34880 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:33:21.979 +0800 | 2023-10-26 02:39:23.364 +0800 |
1 | 26 | 34343 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:27:56.319 +0800 | 2023-10-26 02:33:21.973 +0800 |
1 | 25 | 33801 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:22:22.294 +0800 | 2023-10-26 02:27:56.313 +0800 |
1 | 24 | 33247 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:16:49.954 +0800 | 2023-10-26 02:22:22.288 +0800 |
1 | 23 | 32726 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:11:21.111 +0800 | 2023-10-26 02:16:49.948 +0800 |
1 | 22 | 32158 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 02:05:32.639 +0800 | 2023-10-26 02:11:21.105 +0800 |
1 | 21 | 31510 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:59:07.773 +0800 | 2023-10-26 02:05:32.630 +0800 |
1 | 20 | 30894 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:53:00.200 +0800 | 2023-10-26 01:59:07.767 +0800 |
1 | 19 | 30236 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:46:19.688 +0800 | 2023-10-26 01:53:00.194 +0800 |
1 | 18 | 29612 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:39:54.522 +0800 | 2023-10-26 01:46:19.679 +0800 |
1 | 17 | 28988 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:33:37.279 +0800 | 2023-10-26 01:39:54.513 +0800 |
1 | 16 | 28386 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:27:33.355 +0800 | 2023-10-26 01:33:37.272 +0800 |
1 | 15 | 27840 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:22:05.963 +0800 | 2023-10-26 01:27:33.347 +0800 |
1 | 14 | 27271 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:16:11.743 +0800 | 2023-10-26 01:22:05.956 +0800 |
1 | 13 | 26682 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:10:19.568 +0800 | 2023-10-26 01:16:11.735 +0800 |
1 | 12 | 26159 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 01:04:51.263 +0800 | 2023-10-26 01:10:19.559 +0800 |
1 | 11 | 25620 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:59:24.024 +0800 | 2023-10-26 01:04:51.257 +0800 |
1 | 10 | 25094 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:54:01.388 +0800 | 2023-10-26 00:59:24.015 +0800 |
1 | 9 | 24542 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:48:28.786 +0800 | 2023-10-26 00:54:01.381 +0800 |
1 | 8 | 24003 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:42:58.676 +0800 | 2023-10-26 00:48:28.778 +0800 |
1 | 7 | 23465 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:37:30.265 +0800 | 2023-10-26 00:42:58.670 +0800 |
1 | 6 | 22923 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:31:54.768 +0800 | 2023-10-26 00:37:30.256 +0800 |
1 | 5 | 22366 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:26:13.296 +0800 | 2023-10-26 00:31:54.762 +0800 |
1 | 4 | 21803 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:20:32.837 +0800 | 2023-10-26 00:26:13.290 +0800 |
1 | 3 | 21245 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:14:57.843 +0800 | 2023-10-26 00:20:32.831 +0800 |
1 | 2 | 20705 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:09:22.113 +0800 | 2023-10-26 00:14:57.837 +0800 |
1 | 1 | 20130 | bls | postgres | SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); | succeeded | 1 row | 2023-10-26 00:03:32.926 +0800 | 2023-10-26 00:09:22.105 +0800 |
@wuputah @muntdan May I ask, once this issue occurs, is there any way to fix it? Currently, I am unable to execute the columnar.vacuum
function, and the fragmented stripes are gradually increasing, causing various problems in queries.
Thanks for your reply!
No, I'm afraid there's no way to fix the problem once it occurs, you would need to recover from a backup+PITR point prior to the error occurring.
This error means the compressed data is corrupted:
ERROR: zstd decompression failed DETAIL: Unknown frame descriptor
The root cause of your issue appears to be slightly different, something went wrong trying to read data during the vacuum.
ERROR: attempt to read columnar data of length 557 from offset 309 of block 148527 of relation 19333
This could have been any other source of file corruption.
I'm curious why vacuum is being run so often? Postgres's auto vacuum should work for you in most cases, so wondering what is trying to be accomplished by running the UDF?
For me it was tje scenario where I update older data and than wanting to reclaim the space of the chunk that got updated/cloned.
@JerrySievert
For me, I initially mistakenly thought that the column engine would automatically merge the written thin strips like clickhouse's MergeTree engine. Until my table data increased to about 40 to 50 million, I found that the number of strips had increased to about 200,000, and the average number of rows in each strip was only more than 200. I realized the seriousness of the problem, and I wanted to try to reduce the number of strips in the table, but after a lot of fiddling, it failed.
Vacuuming tables will optimize tables that have had many inserts, updates, or deletes. There are three levels of vacuum on columnar tables:
VACUUM table will rewrite recently created stripes into optimal stripes of maximum size. If making many small inserts or updates into a columnar table, run VACUUM frequently to optimize the table.
SELECT columnar.vacuum(...) will reclaim space from updated and deleted rows, writing new optimal stripes and compact the amount of space the table occupies.
VACUUM FULL table will do a full rewrite of the table making fully optimized stripes. This operation is very expensive but produces the best results.
Without regard to table locked, I tried to the above three vacuum methods,But VACUUM table
cannot resolve so many historical stripes, and columnar.vacuum
halfway through the execution, I encountered the error I reported above and could not continue. Lastly, VACUUM FULL table
after more than ten hours of execution, it still didn't end. It seemed that the program was frozen, so I had to give up in the end.