apache/pulsar-client-go

retry producer creation upon error after successful topic lookup

zzzming opened this issue · 0 comments

Expected behavior

In the newPartitionProducer() function, there should be a retry of grabCnx(). It will be similar to the reconnectToBroker's grabCnx() retry logic.

Java producer has this retry logic.

Actual behavior

At the producer creation call, after a successful topic lookup at grabCnx() in producer_partition.go, if there is a network issue before the COMMAND to create producer sent, the grabCnx() will exit without retry.

We had frequent failures upon the initial producer creation.

Steps to reproduce

It's tricky to reproduce. But we observe the problem more frequently on Azure pod's initialization stage. After implementing the grabCnx() retry in the newPartitionProducer(), the problem has gone away. (Will do a PR)

System configuration

Pulsar version: 2.10