trinodb/trino

Iceberg table .crc files are not deleted when using Hadoop file system driver

Opened this issue · 0 comments

When an iceberg table is dropped, the metadata and data files associated with that table is deleted. However, the .crc files that are created by the Hadoop file system driver are left in place. The .crc files are also left in place any time an Iceberg table data file is deleted (for example, through a merge or partition delete). This may not be noticeable when using an HDFS filesystem, but it is obvious when using the Hadoop driver to access local files.

The problem is when a temporary table is re-used for importing data and then dropped, the directory can quickly fill up with .crc files.

Steps to reproduce (Using Trino 465)

  1. Create an Iceberg catalog and enable the Hadoop File system:
  2. fs.hadoop.enabled=true
  3. Create a table, insert a row, and drop the table.
  4. Look at the directory contents and notice that it is full of .crc files, but not the original files.
trino:sa> CREATE TABLE iceberg.sa.test (id BIGINT) WITH (format = 'ORC', format_version = 2, location = '/tmp/trino/test');
CREATE TABLE
trino:sa> insert into iceberg.sa.test values (100);
INSERT: 1 row

Query 20241212_211046_04630_ffbh4, FINISHED, 1 node
Splits: 130 total, 130 done (100.00%)
0.21 [0 rows, 0B] [0 rows/s, 0B/s]

trino:sa> drop table iceberg.sa.test;
DROP TABLE
[root@DSDDF8 trino]# find /tmp/trino/test/
/tmp/trino/test/
/tmp/trino/test/metadata
/tmp/trino/test/metadata/.20241212_204959_02122_ffbh4-ffb96f8f-b160-4e43-ba83-9125f8b3c0be.stats.crc
/tmp/trino/test/metadata/.snap-4082420308949166332-1-6d590caa-05d5-4580-aca6-c0b9f6484512.avro.crc
/tmp/trino/test/metadata/.6d590caa-05d5-4580-aca6-c0b9f6484512-m0.avro.crc
/tmp/trino/test/metadata/.00001-6eb9896d-e69f-49e3-aae9-b4bba84853fd.metadata.json.crc
/tmp/trino/test/metadata/.snap-3493362677341884085-1-59873be5-9f63-4028-ad1d-4531e9605ec4.avro.crc
/tmp/trino/test/metadata/.snap-4850668042992078922-1-8d0989d4-c49e-4f34-8ee6-13358de87d47.avro.crc
/tmp/trino/test/metadata/.00000-cccfb8c5-d323-4502-848d-6600f37a17c2.metadata.json.crc
/tmp/trino/test/metadata/.20241212_211046_04630_ffbh4-8298051c-b4ba-49ad-a4e2-fd188f481237.stats.crc
/tmp/trino/test/metadata/.3a0e7301-11f8-4548-a870-5d1a95e0a65f-m0.avro.crc
/tmp/trino/test/metadata/.snap-805553046563604304-1-3a0e7301-11f8-4548-a870-5d1a95e0a65f.avro.crc
/tmp/trino/test/metadata/.00001-68dab464-3500-4426-b226-0d165134e10e.metadata.json.crc
/tmp/trino/test/metadata/.00000-e7cc15c2-ffd4-4be4-bef2-7248f0693442.metadata.json.crc
/tmp/trino/test/metadata/.snap-7636965954837989284-1-4d46c19d-65d5-4608-be8a-baf554e3232f.avro.crc
/tmp/trino/test/data
/tmp/trino/test/data/.20241212_211046_04630_ffbh4-8c075593-72c0-43d8-97e7-187e963e87c2.orc.crc
/tmp/trino/test/data/.20241212_204959_02122_ffbh4-39dd734a-96fe-4d9b-8d80-d6bca57c678d.orc.crc