trinodb/trino

Processing Results Files in TPC-DS Test Cases

JGillette71 opened this issue · 1 comments

Issue: How to Process Results File Format in TPC-DS Test Cases for Benchmarking Automation

Description

I am attempting to automate TPC-DS benchmarking with Trino and encountered difficulties processing the result files from the following repository:

TPC-DS Queries & Results

Each .result file appears to be in a custom ASCII-delimited format. My scripts read the TPC-DS queries (e.g., q01.sql) and their corresponding ground truth result files (e.g., q01.result), runs the query, and compares the results while capturing metrics like execution time.

The workflow is as follows:

  1. A query is passed to Trino to query Delta Lake (I have TPC-DS data written there for end-to-end benchmarking).
  2. The results returned from Trino are compared against the ground truth from the .result file.

Unfortunately, the comparison step always fails. I suspect this is due to subtle formatting differences or improper parsing. I've tried skipping the metadata and treating like a pipe-delimited CSV and comparing hashes among a myriad of other methods with little success. I did confirm correct results on a few with visual inspection of query returns, so no issue with queries themselves.

Questions

  1. Could you provide details or documentation on the custom .result file format?
  2. Are there tools or utilities within the Trino ecosystem that help standardize or process these .result files for comparison with Trino's output?
  3. Are there any recommended practices for aligning formatting between Trino results and the .result files?

Context

  • Environment: Enterprise Trino deployment on K8s querying a S3 hosted Delta Lake. Base images are read only; no options to use additional tools outside Trino version 446.

Any pointers, examples, or further documentation would be greatly appreciated. Thank you!