How to cache scramble tables in Spark?
Opened this issue · 2 comments
hychen20 commented
How to cache scramble tables in Spark?
pyongjoo commented
The standard caching statement [1] should work when prefixed with bypass
. For example.
verdict.sql('bypass cache table schema.scramble_table')
Disclaimer: We have not tested this yet, so I am not 100% certain.
[1] https://docs.databricks.com/spark/latest/spark-sql/language-manual/cache-table.html
hychen20 commented
Sorry, it seems it does not work. I cached the scramble lineitem table as well as the verdictdbmeta table. I can see the tables are cached from the Spark UI, however, the TPC-H Q1 still takes the same amount of time as when the tables are not cached ...
Here's my code:
verdict.setDefaultSchema(schema) // tpch1g
verdict.sql("bypass cache table lineitem")
verdict.sql("bypass cache table orders")
verdict.sql("bypass cache table verdictdbmeta.verdictdbmeta")
verdict.sql("bypass cache table lineitem_scramble")
verdict.sql("bypass cache table orders_scramble")
val q_verdict = spark.sparkContext.getConf.get("spark.verdictdb.query") // Q1, Q6, or Q14
val rs_verdict = verdict.sql(q_verdict)
rs_verdict.print()