MeltanoLabs/tap-snowflake

Performance benchmark for BATCH

Closed this issue · 2 comments

It would be awesome to create a benchmark for the Snowflake connectors, with and without BATCH, and specifically on datasets that would most benefit from BATCH as a high-throughput optimization.

Per:

This creates a really nice repeatable process for anyone in the community who wants to do their own benchmarks:

  1. Download the datasets using the links in your readme.
  2. Install the tap and configure it with the path to the downloaded files.
  3. Run the tap to a target directly, cat the output to a local buffer file, or load it into a database that you want to test as a tap.
  4. Test, observe, tweak, repeat! 🎉

The datasets included are:

image

If think the specific data I'd love to see for this...

Q: How quickly can we sync any one of the provided sample streams from tap-snowflake to target-snowflake:

  1. With batch messaging disabled. E.g.: tap-snowflake --config=tap-config.json | target-snowflake --config=target-config.json
  2. With batch messaging enabled. E.g.: tap-snowflake --config=tap-config.json --config=batch-config.json | target-snowflake --config=target-config.json
  3. With batch messaging enabled, but tap and target run in isolation:
    1. tap-snowflake --config=tap-config.json > runresults.singer.jsonl
    2. cat runresults.singer.jsonl | target-snowflake --config=target-config.json

@kgpayne - If this turns out to be difficult, totally ok to postpone for a future iteration. It'd be worth a moderate investment but not not worth delaying the batch PRs themselves, if that's helpful.

Presumably you'd seed the exercise by running something like meltano run tap-stackoverflow-sample target-snowflake - but if you run into any problems getting sample data loaded, that could be a potential blocker/slowdown here. (No don't we'll work out any kinks over time.)

Closing as resolved. At least for now, this should be sufficient: meltano/sdk#906 (comment)