ClusterLogForwarder not able to use multiple Elasticsearch nodes as outputs
theodor2311 opened this issue · 7 comments
Describe the bug
The ClusterLogForwarder using Elasticsearch as outputs is not able to configure multiple Elasticsearch nodes in the same Elasticsearch cluster.
Environment
- OpenShift 4.8.15
- cluster-logging.5.2.2-21
Rationale
The deployment of the Elasticsearch cluster does not include any load balancer/VIP in the architecture, the normal practice is to pass a list of Elasticsearch nodes to do the load balancing at the client-side. E.g Kibana using "elasticsearch.hosts", Fluentd using "hosts".
To Reproduce
- Create a ClusterLogForwarder with a list of Elasticsearch.
apiVersion: "logging.openshift.io/v1"
kind: ClusterLogForwarder
metadata:
name: instance
namespace: openshift-logging
spec:
outputs:
- name: elasticsearch-insecure
type: "elasticsearch"
url: http://elasticsearch.insecure.com:9200,http://elasticsearch2.insecure.com:9200
pipelines:
...
- Check the /etc/fluent/fluent.conf from the fluentd pods.
Expected behavior
The operator is able to parse the spec.outputs.url to "hosts" if using Elasticsearch as outputs.
...
<match **>
@type copy
<store>
@type elasticsearch
@id elasticsearch_insecure
hosts http://elasticsearch.insecure.com:9200,http://elasticsearch2.insecure.com:9200
...
Actual behavior
The operator will parse the spec.outputs.url, tweak it, and put it to the "host" configuration.
...
<match **>
@type copy
<store>
@type elasticsearch
@id elasticsearch_insecure
host elasticsearch.insecure.com:9200,http
port 9200
...
Additional context
Kibana using "elasticsearch.hosts":
https://www.elastic.co/guide/en/kibana/current/production.html#high-availability
Fluentd using "hosts":
https://docs.fluentd.org/output/elasticsearch#hosts-optional
@alanconway care to convert to a JIRA and put some API design around this? Per @lukas-vlcek this is a valid usecase
Will do. Kafka has a similar feature that we do expose.
@theodor2311 Please take a look at https://issues.redhat.com/browse/LOG-2016 and put a comment there to indicate if I've captured the issue correctly. One question - should we use the nodes in the order given, or randomize? If the main goal is load-balancing then randomized might spread the load better. But if there's a "preferred" node then we should take the first entry first.
@jcantrill The JIRA is enough of a design by itself, this is just adding a "Nodes" field to the Elasticsearch struct and it is the same pattern we already used for Kafka and it's "Brokers" field.
I'll close this issue when @theodor2311 approves https://issues.redhat.com/browse/LOG-2016, we'll track the JIRA from then on.
@alanconway Done with thanks. I think matching fluentd's "hosts" randomized behavior should be enough.
/close
Continuing to track this issue at https://issues.redhat.com/browse/LOG-2016
@alanconway: Closing this issue.
In response to this:
/close
Continuing to track this issue at https://issues.redhat.com/browse/LOG-2016
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.