fluent/fluent-plugin-kafka

On first restart we are facing issue kafka hostname resolution

sharmavijay1991 opened this issue · 2 comments

Describe the bug

We are publishing periodic logs to Kafka-Streams(3 zookeeper, 5 kafka broker). In this when we are restarting our system, we are facing errors with hostname resolution.
Due to these error fluentd gets restarted and after restart it works fine.

Error:

2021-07-26 13:33:50 +0000 [warn]: #0 Send exception occurred: Could not connect to any of the seed brokers:

  • kafka://xxxx-kafka-xxxxxxx:9093: getaddrinfo: hostname nor servname provided, or not known
    2021-07-26 13:33:50 +0000 [warn]: #0 Exception Backtrace : /usr/local/lib/ruby/gems/2.4/gems/ruby-kafka-1.3.0/lib/kafka/cluster.rb:448:in fetch_cluster_info' /usr/local/lib/ruby/gems/2.4/gems/ruby-kafka-1.3.0/lib/kafka/cluster.rb:402:in cluster_info'
    /usr/local/lib/ruby/gems/2.4/gems/ruby-kafka-1.3.0/lib/kafka/cluster.rb:102:in refresh_metadata!' /usr/local/lib/ruby/gems/2.4/gems/ruby-kafka-1.3.0/lib/kafka/cluster.rb:56:in add_target_topics'
    /usr/local/lib/ruby/gems/2.4/gems/fluent-plugin-kafka-0.16.3/lib/fluent/plugin/kafka_producer_ext.rb:93:in initialize' /usr/local/lib/ruby/gems/2.4/gems/fluent-plugin-kafka-0.16.3/lib/fluent/plugin/kafka_producer_ext.rb:60:in new'
    /usr/local/lib/ruby/gems/2.4/gems/fluent-plugin-kafka-0.16.3/lib/fluent/plugin/kafka_producer_ext.rb:60:in topic_producer' /usr/local/lib/ruby/gems/2.4/gems/fluent-plugin-kafka-0.16.3/lib/fluent/plugin/out_kafka2.rb:233:in write'
    /usr/local/lib/ruby/gems/2.4/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:1123:in try_flush' /usr/local/lib/ruby/gems/2.4/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:1423:in flush_thread_run'
    /usr/local/lib/ruby/gems/2.4/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:452:in block (2 levels) in start' /usr/local/lib/ruby/gems/2.4/gems/fluentd-1.3.3/lib/fluent/plugin_helper/thread.rb:78:in block in thread_create'

To Reproduce

  1. Stop the fluentd. This will lead some backlog generation(log files in json format).
  2. reboot the whole system. (Note kafka brokers are running on different setups/machines)
  3. after reboot we will face the temporary getaddrinfo errors.
  4. above will lead fluentd restarts. and after starts the error gets disappear.

Extra info:
We are using catsweep to pick and process the input files. (json formatted logs)

Expected behavior

Behavior should be persistent.
At the boot time/startup we should not see these errors. (as they are not coming after process restart)

Your Environment

- Fluentd version: 1.3.3
- TD Agent version: not using
- fluent-plugin-kafka version: 0.16.3
- ruby-kafka version: 1.3.0
- Operating system: FreeBSD 10.4
- Kernel version: 10.4-RELEASE

Your Configuration

<ROOT>
  <system>
    log_level debug
    process_name "xxxx_fluentd"
  </system>
  <source>
    @type cat_sweep
    file_path_with_glob "xxxxxxx/*.json"
    format json
    tag "xxxxxxx.kafka"
    waiting_seconds 40
    remove_after_processing true
    processing_file_suffix ".processing"
    error_file_suffix ".err"
    <parse>
      @type json
    </parse>
  </source>
  <match xxxxxx.kafka>
    default_topic "xxxx-log"
    brokers xxxx-kafka-xxxxxxx:9093
    ssl_ca_cert /xxxx/ca_list.pem
    ssl_client_cert "/xxx/client_cert.pem"
    ssl_ca_certs_from_system true
    ssl_client_cert_key "/xxxx/client_cert_key.pem"
    @type kafka2
    compression_codec "zstd"
    max_send_retries 1
    required_acks -1
    idempotent true
    default_message_key "XYZ"
    <format>
      @type "json"
    </format>
    <buffer>
      @type "file"
      flush_interval 10s
      retry_wait 2
      overflow_action block
      path "/xxxx/kafka_xxxx"
    </buffer>
  </match>
</ROOT>

Your Error Log

2021-07-26 13:33:49 +0000 [debug]: #0 enqueue_thread actually running
2021-07-26 13:33:49 +0000 [debug]: #0 flush_thread actually running
2021-07-26 13:33:50 +0000 [warn]: #0 Send exception occurred: Could not connect to any of the seed brokers:
- kafka://xxxx-kafka-xxxxxxx:9093: getaddrinfo: hostname nor servname provided, or not known
2021-07-26 13:33:50 +0000 [warn]: #0 Exception Backtrace : /usr/local/lib/ruby/gems/2.4/gems/ruby-kafka-1.3.0/lib/kafka/cluster.rb:448:in `fetch_cluster_info'
/usr/local/lib/ruby/gems/2.4/gems/ruby-kafka-1.3.0/lib/kafka/cluster.rb:402:in `cluster_info'
/usr/local/lib/ruby/gems/2.4/gems/ruby-kafka-1.3.0/lib/kafka/cluster.rb:102:in `refresh_metadata!'
/usr/local/lib/ruby/gems/2.4/gems/ruby-kafka-1.3.0/lib/kafka/cluster.rb:56:in `add_target_topics'
/usr/local/lib/ruby/gems/2.4/gems/fluent-plugin-kafka-0.16.3/lib/fluent/plugin/kafka_producer_ext.rb:93:in `initialize'
/usr/local/lib/ruby/gems/2.4/gems/fluent-plugin-kafka-0.16.3/lib/fluent/plugin/kafka_producer_ext.rb:60:in `new'
/usr/local/lib/ruby/gems/2.4/gems/fluent-plugin-kafka-0.16.3/lib/fluent/plugin/kafka_producer_ext.rb:60:in `topic_producer'
/usr/local/lib/ruby/gems/2.4/gems/fluent-plugin-kafka-0.16.3/lib/fluent/plugin/out_kafka2.rb:233:in `write'
/usr/local/lib/ruby/gems/2.4/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:1123:in `try_flush'
/usr/local/lib/ruby/gems/2.4/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:1423:in `flush_thread_run'
/usr/local/lib/ruby/gems/2.4/gems/fluentd-1.3.3/lib/fluent/plugin/output.rb:452:in `block (2 levels) in start'
/usr/local/lib/ruby/gems/2.4/gems/fluentd-1.3.3/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

Additional context

fluent plugin info =>

fluent-config-regexp-type (1.0.0)
fluent-plugin-cat-sweep (0.2.0)
fluent-plugin-grepcounter (0.6.0)
fluent-plugin-kafka (0.16.3)
fluent-plugin-mail (0.3.0)
fluent-plugin-rewrite-tag-filter (2.4.0)
fluentd (1.3.3)

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

This issue was automatically closed because of stale in 30 days