Performance benchmark for BATCH

Question

Performance benchmark for BATCH

Closed this issue 2 years ago · 2 comments

It would be awesome to create a benchmark for the Snowflake connectors, with and without BATCH, and specifically on datasets that would most benefit from BATCH as a high-throughput optimization.

Per:

meltano/sdk#975 (reply in thread)

This creates a really nice repeatable process for anyone in the community who wants to do their own benchmarks:

Download the datasets using the links in your readme.

Install the tap and configure it with the path to the downloaded files.

Run the tap to a target directly, cat the output to a local buffer file, or load it into a database that you want to test as a tap.

Test, observe, tweak, repeat! 🎉

The datasets included are:

If think the specific data I'd love to see for this...

Q: How quickly can we sync any one of the provided sample streams from tap-snowflake to target-snowflake:

With batch messaging disabled. E.g.: tap-snowflake --config=tap-config.json | target-snowflake --config=target-config.json
With batch messaging enabled. E.g.: tap-snowflake --config=tap-config.json --config=batch-config.json | target-snowflake --config=target-config.json
With batch messaging enabled, but tap and target run in isolation:
1. tap-snowflake --config=tap-config.json > runresults.singer.jsonl
2. cat runresults.singer.jsonl | target-snowflake --config=target-config.json

Answer 1 · 2022-09-29T23:38:21.000Z

@kgpayne - If this turns out to be difficult, totally ok to postpone for a future iteration. It'd be worth a moderate investment but not not worth delaying the batch PRs themselves, if that's helpful.

Presumably you'd seed the exercise by running something like meltano run tap-stackoverflow-sample target-snowflake - but if you run into any problems getting sample data loaded, that could be a potential blocker/slowdown here. (No don't we'll work out any kinks over time.)

Answer 2 · 2022-10-25T19:13:33.000Z

Closing as resolved. At least for now, this should be sufficient: meltano/sdk#906 (comment)