brianc/node-pg-copy-streams

Very different times on table with keys vs table without keys

ablaszkiewicz opened this issue · 5 comments

I am trying to insert quite big amount of data (100k rows) into table.

Due to how my app is architected, I need to do an

combinedStream.on('end', () => writeStream.end());

And then I measure time between firing stream.end() and receiving stream.on('finish").

When the target table contains no keys, above mentioned measure shows 0.9s which is probably a time it takes request to travel back and forth.
When the target table contains keys, measure shows 15s.

Also (I know that you fixed it already yesterday but it shows something really interestig) - when inserting more amount of data into keyless table (200k rows for example) everything works fine and it still takes ~1s to end the stream.
But when inserting same data into table with keys, I get this error

this.chunks.push(...chunks)
                ^

RangeError: Maximum call stack size exceeded
    at CopyStreamQuery._writev (C:\Users\aleksander.blaszkie\Documents\GitRepos\ingestor\node_modules\pg-copy-streams\copy-from.js:43:17)

I know that you already fixed this but this error being thrown makes me think that for some reason when dealing with keyless table, data streams smoothly into the database but when dealing with table with keys it first waits for end() signal and only after receiving it, the database tries to ingest everything at once.

Do you have any ideas on why this might be happening?
Am I able to somehow monitor when the data is being sent into the database?

can I ask what you mean by "table with no keys" ? do you mean "empty table" vs "non empty table" ?

You know what. It was obviously problem with my implementation. I was opening multiple streams but only from one client and I suspect one had to wait for another to finish and this is why data couldn't "flow" through the system

ok thanks for closing this. you may still want to dig into why so much data gets buffered. I suspect you have a problem (or maybe this is a choice) with your sources not respecting the backpressure protocol (a source should stop writing into a sink when sink.write returns false)

I know what was causing this. I only had one stream (green) opened at the same time and it was causing yellow and red to be stuck in between:
image

Other streams had to wait until the green stream can end (and green had to wait for readStream to end) and only then other streams opened - that is why there was such a big delay between readStream.on('end') and writeStream.on('finish').

My previous implementation used same PoolClient to instantiate all 3 streams and this is why this happened. New implementation uses 3 different instances of PoolClient to instantiate 3 streams, so now data can flow freely without waiting for other streams to finish:
image

And the time difference between tables was caused by the order of instantiating streams. I was getting really low times when writing only to green and really high times when writing to others

ok. good job figuring that out !