ClusterLogForwarder not able to use multiple Elasticsearch nodes as outputs

Question

ClusterLogForwarder not able to use multiple Elasticsearch nodes as outputs

theodor2311 opened this issue 3 years ago · 7 comments

Describe the bug
The ClusterLogForwarder using Elasticsearch as outputs is not able to configure multiple Elasticsearch nodes in the same Elasticsearch cluster.

Environment

OpenShift 4.8.15
cluster-logging.5.2.2-21

Rationale
The deployment of the Elasticsearch cluster does not include any load balancer/VIP in the architecture, the normal practice is to pass a list of Elasticsearch nodes to do the load balancing at the client-side. E.g Kibana using "elasticsearch.hosts", Fluentd using "hosts".

To Reproduce

Create a ClusterLogForwarder with a list of Elasticsearch.

apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  outputs:
   - name: elasticsearch-insecure
     type: "elasticsearch"
     url: http://elasticsearch.insecure.com:9200,http://elasticsearch2.insecure.com:9200
  pipelines:
   ...

Check the /etc/fluent/fluent.conf from the fluentd pods.

Expected behavior
The operator is able to parse the spec.outputs.url to "hosts" if using Elasticsearch as outputs.

...
<match **>
  @type copy
  <store>
    @type elasticsearch
    @id elasticsearch_insecure
    hosts http://elasticsearch.insecure.com:9200,http://elasticsearch2.insecure.com:9200
...

Actual behavior
The operator will parse the spec.outputs.url, tweak it, and put it to the "host" configuration.

...
<match **>
  @type copy
  <store>
    @type elasticsearch
    @id elasticsearch_insecure
    host elasticsearch.insecure.com:9200,http
    port 9200
...

Additional context
Kibana using "elasticsearch.hosts":
https://www.elastic.co/guide/en/kibana/current/production.html#high-availability
Fluentd using "hosts":
https://docs.fluentd.org/output/elasticsearch#hosts-optional

Answer 1 · 2021-11-05T15:34:20.000Z

@alanconway care to convert to a JIRA and put some API design around this? Per @lukas-vlcek this is a valid usecase

Answer 2 · 2021-11-05T21:04:31.000Z

Will do. Kafka has a similar feature that we do expose.

Answer 3 · 2021-11-22T20:49:33.000Z

@theodor2311 Please take a look at https://issues.redhat.com/browse/LOG-2016 and put a comment there to indicate if I've captured the issue correctly. One question - should we use the nodes in the order given, or randomize? If the main goal is load-balancing then randomized might spread the load better. But if there's a "preferred" node then we should take the first entry first.

@jcantrill The JIRA is enough of a design by itself, this is just adding a "Nodes" field to the Elasticsearch struct and it is the same pattern we already used for Kafka and it's "Brokers" field.

Answer 4 · 2021-11-22T20:51:33.000Z

I'll close this issue when @theodor2311 approves https://issues.redhat.com/browse/LOG-2016, we'll track the JIRA from then on.

Answer 5 · 2021-11-23T01:41:41.000Z

@alanconway Done with thanks. I think matching fluentd's "hosts" randomized behavior should be enough.

Answer 6 · 2021-11-26T19:25:58.000Z

/close
Continuing to track this issue at https://issues.redhat.com/browse/LOG-2016

Answer 7 · 2021-11-26T19:26:29.000Z

@alanconway: Closing this issue.

In response to this:

/close
Continuing to track this issue at https://issues.redhat.com/browse/LOG-2016

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.