Support for large transactions
Opened this issue · 2 comments
Our integration (using Stitch, but anyone using tap-postgres would experience this) keeps hitting the Out of Memory error when it encounters a large transaction.
There are a few ways around this:
- For us, a specific table gives the problem, and we're not consuming anything for that table anyway, so provide a way for us to specify the
filter-tables
oradd-tables
parameter towal2json
. From Stitch, I'd imagineadd-tables
would be populated by the actual "tables to replicate" selected in the UI. - Use
write-in-chunks=true
orformat-version=2
to get around the issue. These are the officially-suggested solutions from thewal2json
maintainers.format-version=2
seems like the easier/preferred route, however that's a pretty recent option, so it may not be widely supported yet. So maybewrite-in-chunks
is the way to go.
FWIW, filter-tables
/add-tables
would also provide some cost/bandwidth savings because you wouldn't be streaming a bunch of extra data that you're just going to throw out.
Just to add one more workaround (though it won't help with full-table syncs), I'm just about to implement the undocumented incremental_limit
from tap-mysql
(see https://github.com/singer-io/tap-mysql/blob/master/tap_mysql/__init__.py#L653) in our fork.
What it does is to set a ceiling in the number of rows queried and emitted in a single run, so there's a fixed upper bound on query size. In our case this is the only way to deal with at as Postgres in Google Cloud SQL doesn't have wal2json
support and also the target doesn't play well with massive chunks.