hydradatabase/hydra

[Bug]: Stopping columnar vacuum can result in table corruption

muntdan opened this issue · 10 comments

What's wrong?

I was testing Updating a row and than running the columnar.vacuum_full().
Because it was taking to long I stopped that one and run a more precise columnar.vacuum for the specific partitioned table.
Something happened and I got the error:
ERROR: attempt to read columnar data of length 8168 from offset 24 of block 3927 of relation 29101 CONTEXT: SQL statement "SELECT columnar.vacuum(tablename, stripe_count)" PL/pgSQL function columnar.vacuum_full(name,real,integer) line 25 at SQL statement

If I now run a select on that table I get this error:
ERROR: Unknown frame descriptorzstd decompression failed ERROR: zstd decompression failed SQL state: XX000 Detail: Unknown frame descriptor

How did you stop the vacuum?

I am using PgAdmin so I canceled the exection from the Stop button:
image

There's a number of things the code does to catch signals to try to avoid this, and I don't know what that stop button actually does... but yes the columnar.vacuum UDFs can cause corruption if "hard interrupted." ^C and pg_cancel_backend should be fine, but I suspect this used pg_terminate_backend and I am guessing that's not handled, and I'm not sure it can be handled. There's potentially more we could do here to make it more durable but would require a bunch more work.

Best workaround at the moment is "don't do that."

I was able to confirm:

  • pgAdmin 4 7.7 uses pg_cancel_backend underneath that button.
  • we do have signal handlers for SIGINT, which is used by pg_cancel_backend, and SIGTERM, which is used by pg_terminate_backend.
  • Using either function did not cause corruption when I tested it.

That said, there might be a bug in the signal handlers themselves. My tests were relatively simple, so there could have been something specific about your database.

My tests used the latest hydra docker image (pg14).

If there's any other information we can use to try to reproduce the issue, please let us know.

Hi.
Indeed I use pgAdmin 4 7.7.
I use your docker image for PG 15.

What might be the big difference is the partitioning:
I have the table partitioned first level: 10 Tenant partition by List.
And than each of those partitions has a 52 partitions for each week of the year by range.

While I have a total of 520 partitions, there is data only for 2 months => 10 Tenants x 9 weeks ~ 90 partitions have data.

FWIW: I faced the same problem, but I don't stop the vacuum manually. I use pg_cron to schedule columnar.vacuum function. The job run details as below:

jobid runid job_pid database username command status return_message start_time end_time
1 42 37348 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:58:31.687 +0800 2023-10-26 02:58:31.697 +0800
1 41 37298 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:58:01.686 +0800 2023-10-26 02:58:01.696 +0800
1 40 37250 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:57:31.686 +0800 2023-10-26 02:57:31.696 +0800
1 39 37200 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:57:01.687 +0800 2023-10-26 02:57:01.697 +0800
1 38 37152 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:56:31.686 +0800 2023-10-26 02:56:31.700 +0800
1 37 37103 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:56:01.686 +0800 2023-10-26 02:56:01.696 +0800
1 36 37055 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:55:31.685 +0800 2023-10-26 02:55:31.695 +0800
1 35 37006 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:55:01.685 +0800 2023-10-26 02:55:01.695 +0800
1 34 36958 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: zstd decompression failed DETAIL: Unknown frame descriptor 2023-10-26 02:54:31.684 +0800 2023-10-26 02:54:31.694 +0800
1 33 36909 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); failed ERROR: attempt to read columnar data of length 557 from offset 309 of block 148527 of relation 19333 2023-10-26 02:54:01.684 +0800 2023-10-26 02:54:01.889 +0800
1 32 36860 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:53:31.684 +0800 2023-10-26 02:53:32.121 +0800
1 31 36763 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:52:36.725 +0800 2023-10-26 02:53:31.678 +0800
1 30 36583 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:50:49.683 +0800 2023-10-26 02:52:36.713 +0800
1 29 36012 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:44:57.255 +0800 2023-10-26 02:50:49.674 +0800
1 28 35470 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:39:23.372 +0800 2023-10-26 02:44:57.249 +0800
1 27 34880 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:33:21.979 +0800 2023-10-26 02:39:23.364 +0800
1 26 34343 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:27:56.319 +0800 2023-10-26 02:33:21.973 +0800
1 25 33801 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:22:22.294 +0800 2023-10-26 02:27:56.313 +0800
1 24 33247 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:16:49.954 +0800 2023-10-26 02:22:22.288 +0800
1 23 32726 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:11:21.111 +0800 2023-10-26 02:16:49.948 +0800
1 22 32158 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 02:05:32.639 +0800 2023-10-26 02:11:21.105 +0800
1 21 31510 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:59:07.773 +0800 2023-10-26 02:05:32.630 +0800
1 20 30894 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:53:00.200 +0800 2023-10-26 01:59:07.767 +0800
1 19 30236 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:46:19.688 +0800 2023-10-26 01:53:00.194 +0800
1 18 29612 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:39:54.522 +0800 2023-10-26 01:46:19.679 +0800
1 17 28988 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:33:37.279 +0800 2023-10-26 01:39:54.513 +0800
1 16 28386 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:27:33.355 +0800 2023-10-26 01:33:37.272 +0800
1 15 27840 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:22:05.963 +0800 2023-10-26 01:27:33.347 +0800
1 14 27271 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:16:11.743 +0800 2023-10-26 01:22:05.956 +0800
1 13 26682 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:10:19.568 +0800 2023-10-26 01:16:11.735 +0800
1 12 26159 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 01:04:51.263 +0800 2023-10-26 01:10:19.559 +0800
1 11 25620 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:59:24.024 +0800 2023-10-26 01:04:51.257 +0800
1 10 25094 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:54:01.388 +0800 2023-10-26 00:59:24.015 +0800
1 9 24542 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:48:28.786 +0800 2023-10-26 00:54:01.381 +0800
1 8 24003 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:42:58.676 +0800 2023-10-26 00:48:28.778 +0800
1 7 23465 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:37:30.265 +0800 2023-10-26 00:42:58.670 +0800
1 6 22923 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:31:54.768 +0800 2023-10-26 00:37:30.256 +0800
1 5 22366 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:26:13.296 +0800 2023-10-26 00:31:54.762 +0800
1 4 21803 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:20:32.837 +0800 2023-10-26 00:26:13.290 +0800
1 3 21245 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:14:57.843 +0800 2023-10-26 00:20:32.831 +0800
1 2 20705 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:09:22.113 +0800 2023-10-26 00:14:57.837 +0800
1 1 20130 bls postgres SELECT columnar.vacuum('public.log_developer_api_access_data', 5000); succeeded 1 row 2023-10-26 00:03:32.926 +0800 2023-10-26 00:09:22.105 +0800

@wuputah @muntdan May I ask, once this issue occurs, is there any way to fix it? Currently, I am unable to execute the columnar.vacuum function, and the fragmented stripes are gradually increasing, causing various problems in queries.
Thanks for your reply!

No, I'm afraid there's no way to fix the problem once it occurs, you would need to recover from a backup+PITR point prior to the error occurring.

This error means the compressed data is corrupted:

ERROR: zstd decompression failed DETAIL: Unknown frame descriptor

The root cause of your issue appears to be slightly different, something went wrong trying to read data during the vacuum.

ERROR: attempt to read columnar data of length 557 from offset 309 of block 148527 of relation 19333

This could have been any other source of file corruption.

I'm curious why vacuum is being run so often? Postgres's auto vacuum should work for you in most cases, so wondering what is trying to be accomplished by running the UDF?

For me it was tje scenario where I update older data and than wanting to reclaim the space of the chunk that got updated/cloned.

@JerrySievert
For me, I initially mistakenly thought that the column engine would automatically merge the written thin strips like clickhouse's MergeTree engine. Until my table data increased to about 40 to 50 million, I found that the number of strips had increased to about 200,000, and the average number of rows in each strip was only more than 200. I realized the seriousness of the problem, and I wanted to try to reduce the number of strips in the table, but after a lot of fiddling, it failed.

Vacuuming tables will optimize tables that have had many inserts, updates, or deletes. There are three levels of vacuum on columnar tables:
VACUUM table will rewrite recently created stripes into optimal stripes of maximum size. If making many small inserts or updates into a columnar table, run VACUUM frequently to optimize the table.
SELECT columnar.vacuum(...) will reclaim space from updated and deleted rows, writing new optimal stripes and compact the amount of space the table occupies.
VACUUM FULL table will do a full rewrite of the table making fully optimized stripes. This operation is very expensive but produces the best results.

Without regard to table locked, I tried to the above three vacuum methods,But VACUUM table cannot resolve so many historical stripes, and columnar.vacuum halfway through the execution, I encountered the error I reported above and could not continue. Lastly, VACUUM FULL table after more than ten hours of execution, it still didn't end. It seemed that the program was frozen, so I had to give up in the end.