delta-io/website

Document Delta Table optimization in a single entrypoint

edmondop opened this issue · 1 comments

With the improvements of Delta Table and the previous existing optimizations, it becomes a little bit harder to wrap our head around it.

  • Data skipping via statistics
  • Data skipping improved via Z Index
  • Bloom Filters
  • Liquid Clustering
  • Merge on Read

Other random ideas add here ... @MrPowers

Thanks for raising this @edmondop.

Here are a few other performance enhancements:

  • relying on metadata only for certain queries
  • Deletion vectors (you kind of already mentioned this one with merge on read)
  • Avoiding expensive file listing operations
  • eliminating small files via compaction
  • calling out that there is file skipping & then predicate pushdown filtering

We could possibly add all these to the Delta Lake Performance blog.