Processing Results Files in TPC-DS Test Cases
JGillette71 opened this issue · 1 comments
Issue: How to Process Results File Format in TPC-DS Test Cases for Benchmarking Automation
Description
I am attempting to automate TPC-DS benchmarking with Trino and encountered difficulties processing the result files from the following repository:
Each .result
file appears to be in a custom ASCII-delimited format. My scripts read the TPC-DS queries (e.g., q01.sql
) and their corresponding ground truth result files (e.g., q01.result
), runs the query, and compares the results while capturing metrics like execution time.
The workflow is as follows:
- A query is passed to Trino to query Delta Lake (I have TPC-DS data written there for end-to-end benchmarking).
- The results returned from Trino are compared against the ground truth from the
.result
file.
Unfortunately, the comparison step always fails. I suspect this is due to subtle formatting differences or improper parsing. I've tried skipping the metadata and treating like a pipe-delimited CSV and comparing hashes among a myriad of other methods with little success. I did confirm correct results on a few with visual inspection of query returns, so no issue with queries themselves.
Questions
- Could you provide details or documentation on the custom
.result
file format? - Are there tools or utilities within the Trino ecosystem that help standardize or process these
.result
files for comparison with Trino's output? - Are there any recommended practices for aligning formatting between Trino results and the
.result
files?
Context
- Environment: Enterprise Trino deployment on K8s querying a S3 hosted Delta Lake. Base images are read only; no options to use additional tools outside Trino version 446.
Any pointers, examples, or further documentation would be greatly appreciated. Thank you!