AbePabbathi/lakehouse-tacklebox

TPCDS: add UC support

Opened this issue · 0 comments

This is a non-trivial fix. To do properly, it requires rewriting a lot of the runner utils and some of the job notebooks.

Things to test

  • What is the best way to write from spark-sql-perf to UC?
    • External volumes -> 3-part-namespace
    • Directly into UC somehow
  • How should we handle UC vs non-UC logic in the notebooks? We can add helpers in the cell but then the code gets messy and is not modular.

To change

  • Add an early error when running the main notebook if not on a UC interactive cluster
  • Add create or replace for volumes path
  • Change the UC paths in constants
  • Convert _add_init_script_to_dbfs to use with open() syntax, so that UC and DBFS paths are both supported
  • Add a UC flag in constants and use that to populate cluster type that run the jobs
  • Figure out how to add UC functionality to data_and_queries job (see above notes)
  • Add test functionality
  • Add tests for UC constants config