Performance benchmark for BATCH
Closed this issue · 2 comments
It would be awesome to create a benchmark for the Snowflake connectors, with and without BATCH, and specifically on datasets that would most benefit from BATCH
as a high-throughput optimization.
Per:
This creates a really nice repeatable process for anyone in the community who wants to do their own benchmarks:
- Download the datasets using the links in your readme.
- Install the tap and configure it with the path to the downloaded files.
- Run the tap to a target directly, cat the output to a local buffer file, or load it into a database that you want to test as a tap.
- Test, observe, tweak, repeat! 🎉
The datasets included are:
If think the specific data I'd love to see for this...
Q: How quickly can we sync any one of the provided sample streams from tap-snowflake
to target-snowflake
:
- With batch messaging disabled. E.g.:
tap-snowflake --config=tap-config.json | target-snowflake --config=target-config.json
- With batch messaging enabled. E.g.:
tap-snowflake --config=tap-config.json --config=batch-config.json | target-snowflake --config=target-config.json
- With batch messaging enabled, but tap and target run in isolation:
tap-snowflake --config=tap-config.json > runresults.singer.jsonl
cat runresults.singer.jsonl | target-snowflake --config=target-config.json
@kgpayne - If this turns out to be difficult, totally ok to postpone for a future iteration. It'd be worth a moderate investment but not not worth delaying the batch PRs themselves, if that's helpful.
Presumably you'd seed the exercise by running something like meltano run tap-stackoverflow-sample target-snowflake
- but if you run into any problems getting sample data loaded, that could be a potential blocker/slowdown here. (No don't we'll work out any kinks over time.)
Closing as resolved. At least for now, this should be sufficient: meltano/sdk#906 (comment)