ConduitIO/conduit-connector-sdk

Bug: Destination Stop might block for 1 minute

lovromazgon opened this issue · 0 comments

Bug description

I discovered this bug while reviewing and testing the grpc-client connector reconnects (conduitio-labs/conduit-connector-grpc-client#9). See steps to reproduce.

Steps to reproduce

  1. Have a grpc-client destination with the following config:
    reconnectDelay: 2s
    maxDowntime: 5s
    
  2. Start the test server, then start Conduit.
  3. Send 2 records into the pipeline, one record should be received by the server.
  4. Do not send back an acknowledgment and stop the server.

After 5 seconds the connector Write function will return an error (max downtime reached) and cause the pipeline to stop, but it won't stop entirely. The connector will keep running for another 1 minute, it will be blocked in the sdk Stop function and wait for the last position to be equal to the position of the second record.

The probable cause for this is that both records seemed to be successfully sent from the Conduit side, but it was actually just stored in the gRPC buffer, possibly even received by the server and buffered there. Only one record was received by the connector though, since it returned an error after that and stopped the stream. Conduit then calls Stop with the position of the second record, thinking the connector must have received it. The connector then waits for the last position to change for 1 minute and gives up afterwards. We could cut this time short if we detect that Run has already stopped running, in that case the last position obviously won't change anymore.

Version

v0.6.0