databricks/spark-sql-perf

TPC-DS.. dataGen.. format

kgebaly opened this issue · 2 comments

table.genData(tableLocation, format, overwrite, clusterByPartitionColumns,
What value does format take when generating TPC-DS benchmarks?

format is for type of data. So it has to mentioned as a string thats what i have found out in Tables.scala
def genData(
location: String,
format: String,
overwrite: Boolean,
clusterByPartitionColumns: Boolean,
filterOutNullPartitionValues: Boolean,
numPartitions: Int)

e.g "text"
so you can give something like tables.genData("/path/to_Data", "text", true, true, true, true, true)

we can use parquet/avro etc. I tried with parquet.