Document Delta Table optimization in a single entrypoint
edmondop opened this issue · 1 comments
edmondop commented
With the improvements of Delta Table and the previous existing optimizations, it becomes a little bit harder to wrap our head around it.
- Data skipping via statistics
- Data skipping improved via Z Index
- Bloom Filters
- Liquid Clustering
- Merge on Read
Other random ideas add here ... @MrPowers
MrPowers commented
Thanks for raising this @edmondop.
Here are a few other performance enhancements:
- relying on metadata only for certain queries
- Deletion vectors (you kind of already mentioned this one with merge on read)
- Avoiding expensive file listing operations
- eliminating small files via compaction
- calling out that there is file skipping & then predicate pushdown filtering
We could possibly add all these to the Delta Lake Performance blog.