wix/greyhound

Producer for retry topics does not use associated consumers SSL config.

JohnSColeman opened this issue · 8 comments

SSL related exceptions seen for retry producer using Java interop.

When consuming a message fails and it is published to a retry topic, it looks like the SSL config we supply to the origin topic consumer is not configured to the topics retry producer.

Thank you @JohnSColeman for reporting this!

Seems like this is indeed a bug, as the internal producer is not provided with extra properties but only "kafkaAuthProperties".
maybe the filter should include more keys.

can you please post here which SSL-related keys you are using?

@natansil Here a list of SSL related key we're using when init GreyhoundConsumerBuilder

mapOf(
  "security.protocol" to securityProtocol!!,
  "auto.offset.reset" to autoOffsetReset,
  "ssl.truststore.location" to sslTruststoreLocation!!,
  "ssl.truststore.password" to sslTruststorePassword!!,
  "ssl.keystore.location" to sslKeystoreLocation!!,
  "ssl.keystore.password" to sslKeystorePassword!!,
  "ssl.key.password" to sslKeyPassword!!)

First we got consumer config

[LOG] {"level":"INFO","ts":"2021-03-25 13:39:36,301","msg":"ConsumerConfig values: 
        allow.auto.create.topics = true
        auto.commit.interval.ms = 5000
        auto.offset.reset = earliest
        ....
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
        ssl.endpoint.identification.algorithm = https
        ssl.engine.factory.class = null
        ssl.key.password = [hidden]
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = config/kafka/staging/user.p12
        ssl.keystore.password = [hidden]
        ssl.keystore.type = JKS
        ssl.protocol = TLSv1.3
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = config/kafka/staging/client.truststore.jks
        ssl.truststore.password = [hidden]
        ssl.truststore.type = JKS
        value.deserializer = class com.wixpress.dst.greyhound.core.consumer.Consumer$$anon$1
"}

Follow by AdminClientConfig

[LOG] {"level":"INFO","ts":"2021-03-25 13:39:36,586","msg":"AdminClientConfig values: 
        bootstrap.servers = ***
        client.dns.lookup = use_all_dns_ips
        client.id = 
        connections.max.idle.ms = 300000
        default.api.timeout.ms = 60000
        metadata.max.age.ms = 300000
        ...
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
        ssl.endpoint.identification.algorithm = https
        ssl.engine.factory.class = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLSv1.3
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
"}

The same apply to Producer when we add retry config to the consumer flow, AdminClient and Producer doesn't omit SSL related property from ConsumerConfig.

Consumer can connect to Kafka with SSL as intend, while Producer and Admin get response of SSL handshake fail

Hi @JohnSColeman and @Hunterza.
I'm deeply sorry for the late reply. I was on vacation this week (Holiday in Israel)

I've fixed the issue hopefully.
Also released a new version (0.1.7) that includes this fix.

Please let me know if issue is resolved completely.

Hi @JohnSColeman and @Hunterza.
I'm deeply sorry for the late reply. I was on vacation this week (Holiday in Israel)

I've fixed the issue hopefully.
Also released a new version (0.1.7) that includes this fix.

Please let me know if issue is resolved completely.

That's great, thank you. We will try the new release.

What happens when the retries are exceeded - do the unsuccessful messages remain in the final retry topic or is there a separate failed message topic?

That's great, thank you. We will try the new release.

Excellent. Let me know if issue is resolved.


Regard failed-messages-topic (aka dead-letter-queue),

Currently there is no out-of-the-box code that does this.
But Java GreyhoundConsumer has ErrorHandler#onUserException
You are using Java api right?

In the error handler, you can extract the current retry attempt similarly to this (this is from our "Wix" Greyhound layer):

private def extractRetryAttempt(record: CoreConsumerRecord[String, A]) =
    NonBlockingRetryHelper(consumerConfig.groupId, retryConfig).retryAttempt(record.topic, record.headers, initialSubscription)

and then compare to RetryConfig#nonBlockingBackoffs(originalTopic).length and if it is indeed the last one then produce to a failed-messages-topic of your choice.

If you can try and contribute this FailedMessageErrorHandler code to Greyhound Java API, I would do my utmost to help you and will be very grateful indeed.

Please let me know if issue is resolved completely.

now we are seeing this: "The configuration 'ssl.keystore.location' was supplied but isn't a known config."

Hi,
The fix included passing all properties that start with .ssl to consumer retry internal producer setup.
(security. and sasl. were already present before the fix)

Are you sure you pass all the needed configuration? is it identical to the regular producer setup?

This issue closed.