This utility tool will run maintenance tasks on your Iceberg tables.
It will:
- Remove any orphan files that exceed the configured retention
- Expire snapshots that exceed the configured retention
- analyze table or specific columns every configured period
- optimize table every configured period
The configuration is stored in the iceberg_maintenance_schedule
table.
table_name VARCHAR NOT NULL,
should_analyze INTEGER,
last_analyzed_on TIMESTAMP(6),
days_to_analyze INTEGER,
columns_to_analyze ARRAY(VARCHAR),
should_optimize INTEGER,
last_optimized_on TIMESTAMP(6),
days_to_optimize INTEGER,
should_expire_snapshots INTEGER,
retention_days_snapshots INTEGER,
should_remove_orphan_files INTEGER,
retention_days_orphan_files INTEGER
The job can be scheduled using cron and configured through environment variables (currently only basic authentication is supported).
It requires the latest trino-python-client to be installed.
The amount of simultaneous maintenance jobs can be configured through setting the NUM_WORKERS
environment variable (default is 5).
export NUM_WORKERS=10
export TRINO_HOST=localhost
export TRINO_PORT=8080
export TRINO_USER=admin
export TRINO_PASSWORD=...
export TRINO_CATALOG=iceberg
export TRINO_SCHEMA=default
python -m trino_iceberg_maintenance