IBM/sarama

KIP-368 implementation connection disconnects during re-auth or OOM

Closed this issue · 1 comments

Versions

Sarama: 1.33.0

Configuration

Any sarama based application connecting to Kafka brokers with SASL connection re-auth enabled. The Issue was detected testing out the Strimzi Canary 0.3.0 RC

Logs
[Sarama] 2022/05/23 08:36:42 Completed pre-auth SASL handshake. Available mechanisms: [SCRAM-SHA-512 PLAIN]
[Sarama] 2022/05/23 08:36:42 Session expiration in 10000 ms and session re-authentication on or after 8680 ms
[Sarama] 2022/05/23 08:36:42 Connected to broker at localhost:9092 (registered as #0)
[Sarama] 2022/05/23 08:36:42 Closed connection to broker localhost:9092
Problem Description

Issue is with the KIP-368 implementation added by PR #2197. The defect means that sometimes the re-auth fails to function correctly. The client application may suffer unexpected disconnections, or worse suffer an OOM condition. The problem worsens under load.

The issue is with the new implementation. It fails to account for the the I/O model of sarama correctly. Its calling of authenticateViaSASL results in input contention between the receiver Go routine and the itself. One my steal bytes from the other, resulting in unexpected connection state. The OOM is a result of incorrect bytes being interpreted as payload lengths.

Apologies for the defect, I intend to open an PR soon.