Creating an Elastic cluster with TLS enabled
grizzlycode opened this issue · 12 comments
Problem description
So I'm trying to use your guide to creating an elastic cluster here. However, I don't believe the code snippets plus link to official docker page is enough to get this particular stack working. Are there plans to update this section?
In particular, I've run into the following issues.
- Changing the container names breaks setup. It is unable to resolve the elasticsearch name now.
-
- It appears you need to change the
lib.sh
file to the new container nameelasticsearch_1
- It appears you need to change the
-
- However, when I do that, I go from
code 6, "Unable to resolve name"
tocode 35, "failure to connect to elasticsearch"
- However, when I do that, I go from
- I also had to change the instances.yml to include the additonal elasticsearch nodes as well as change/add bind mount and certifcate name changes such as elasticserach_1 otherwise they weren't found
I'm sure after setup is fixed there may be other issues, but I'm currently stuck at setup.
Extra information
- I did add the
vm.max_map_count to /etc/systcl.conf
- I have each elasticsearch container on its own bind mount as recommended
- I created another container to troubleshoot and used same network namespace of the setup container and I was able to ping/resolve elasticsearch_1 so not sure why it can't connect to it
Stack configuration
I made the following changes to docker compose
- Added additional elasticsearch containers per syntax from cluster page
- Added bind mounts for elasticsearch data to each elastic container
- Added TLS bind mounts for each elastic container and updated name
I updated intstances.yml
- Added additional elasticsearch containers to it along with DNS/IP info
I updated lib.sh
in setup
- Updated variable to look for elasticsearch_1 instead of elasticsearch as the name of the container is changed
services:
tls:
profiles:
- setup
build:
context: tls/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
user: root # ensures we can write to the local tls/ directory.
init: true
volumes:
- ./tls/entrypoint.sh:/entrypoint.sh:ro,Z
- ./tls/instances.yml:/usr/share/elasticsearch/tls/instances.yml:ro,Z
- ./tls/certs:/usr/share/elasticsearch/tls/certs:z
setup:
profiles:
- setup
build:
context: setup/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
init: true
volumes:
- ./setup/entrypoint.sh:/entrypoint.sh:ro,Z
- ./setup/lib.sh:/lib.sh:ro,Z
- ./setup/roles:/roles:ro,Z
# (!) CA certificate. Generate using the 'tls' service.
- ./tls/certs/ca/ca.crt:/ca.crt:ro,z
- ./elasticsearch/logs/:/usr/share/elasticsearch/logs/
environment:
ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
LOGSTASH_INTERNAL_PASSWORD: ${LOGSTASH_INTERNAL_PASSWORD:-}
KIBANA_SYSTEM_PASSWORD: ${KIBANA_SYSTEM_PASSWORD:-}
METRICBEAT_INTERNAL_PASSWORD: ${METRICBEAT_INTERNAL_PASSWORD:-}
FILEBEAT_INTERNAL_PASSWORD: ${FILEBEAT_INTERNAL_PASSWORD:-}
HEARTBEAT_INTERNAL_PASSWORD: ${HEARTBEAT_INTERNAL_PASSWORD:-}
MONITORING_INTERNAL_PASSWORD: ${MONITORING_INTERNAL_PASSWORD:-}
BEATS_SYSTEM_PASSWORD: ${BEATS_SYSTEM_PASSWORD:-}
networks:
- elk
depends_on:
- elasticsearch_1
elasticsearch_1:
build:
context: elasticsearch/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
volumes:
- ./elasticsearch/config/elasticsearch_1.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,Z
# (!) TLS certificates. Generate using the 'tls' service.
- ./tls/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt:ro,z
- ./tls/certs/elasticsearch_1/elasticsearch_1.crt:/usr/share/elasticsearch/config/elasticsearch_1.crt:ro,z
- ./tls/certs/elasticsearch_1/elasticsearch_1.key:/usr/share/elasticsearch/config/elasticsearch_1.key:ro,z
- ./data/es01data:/usr/share/elasticsearch/data:Z
ports:
- 9200:9200
- 9300:9300
environment:
node.name: elasticsearch_1
ES_JAVA_OPTS: -Xms512m -Xmx512m
# Bootstrap password.
# Used to initialize the keystore during the initial startup of
# Elasticsearch. Ignored on subsequent runs.
ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
# Use other cluster nodes for unicast discovery
discovery.seed_hosts: elasticsearch_2,elasticsearch_3
# Define initial masters, assuming a cluster size of at least 3
cluster.initial_master_nodes: elasticsearch_1,elasticsearch_2,elasticsearch_3
networks:
- elk
restart: unless-stopped
elasticsearch_2:
build:
context: elasticsearch/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
volumes:
- ./elasticsearch/config/elasticsearch_2.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
# (!) TLS certificates. Generate using the 'tls' service.
- ./tls/certs/elasticsearch_2/elasticsearch_2.crt:/usr/share/elasticsearch/config/elasticsearch_2.crt:ro,z
- ./tls/certs/elasticsearch_2/elasticsearch_2.key:/usr/share/elasticsearch/config/elasticsearch_2.key:ro,z
- ./data/es02data:/usr/share/elasticsearch/data:Z
environment:
ES_JAVA_OPTS: -Xms512m -Xmx512m
# Set a deterministic node name.
node.name: elasticsearch_2
# Use other cluster nodes for unicast discovery.
discovery.seed_hosts: elasticsearch_1,elasticsearch_3
# Define initial masters, assuming a cluster size of at least 3.
cluster.initial_master_nodes: elasticsearch_1,elasticsearch_2,elasticsearch_3
networks:
- elk
elasticsearch_3:
build:
context: elasticsearch/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
volumes:
- ./elasticsearch/config/elasticsearch_3.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
# (!) TLS certificates. Generate using the 'tls' service.
- ./tls/certs/elasticsearch_3/elasticsearch_3.crt:/usr/share/elasticsearch/config/elasticsearch_3.crt:ro,z
- ./tls/certs/elasticsearch_3/elasticsearch_3.key:/usr/share/elasticsearch/config/elasticsearch_3.key:ro,z
- ./data/es03data:/usr/share/elasticsearch/data:Z
environment:
ES_JAVA_OPTS: -Xms512m -Xmx512m
# Set a deterministic node name.
node.name: elasticsearch_3
# Use other cluster nodes for unicast discovery.
discovery.seed_hosts: elasticsearch_1,elasticsearch_2
# Define initial masters, assuming a cluster size of at least 3.
cluster.initial_master_nodes: elasticsearch_1,elasticsearch_2,elasticsearch_3
networks:
- elk
logstash:
build:
context: logstash/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
volumes:
- ./logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml:ro,Z
- ./logstash/pipeline:/usr/share/logstash/pipeline:ro,Z
# (!) CA certificate. Generate using the 'tls' service.
- ./tls/certs/ca/ca.crt:/usr/share/logstash/config/ca.crt:ro,z
ports:
- 5044:5044
- 50000:50000/tcp
- 50000:50000/udp
- 9600:9600
environment:
LS_JAVA_OPTS: -Xms256m -Xmx256m
LOGSTASH_INTERNAL_PASSWORD: ${LOGSTASH_INTERNAL_PASSWORD:-}
networks:
- elk
depends_on:
- elasticsearch_1
restart: unless-stopped
kibana:
build:
context: kibana/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
volumes:
- ./kibana/config/kibana.yml:/usr/share/kibana/config/kibana.yml:ro,Z
# (!) TLS certificates. Generate using the 'tls' service.
- ./tls/certs/ca/ca.crt:/usr/share/kibana/config/ca.crt:ro,z
- ./tls/certs/kibana/kibana.crt:/usr/share/kibana/config/kibana.crt:ro,Z
- ./tls/certs/kibana/kibana.key:/usr/share/kibana/config/kibana.key:ro,Z
ports:
- 5601:5601
environment:
KIBANA_SYSTEM_PASSWORD: ${KIBANA_SYSTEM_PASSWORD:-}
networks:
- elk
depends_on:
- elasticsearch_1
restart: unless-stopped
networks:
elk:
driver: bridge
Docker setup
$ docker version
[Client: Docker Engine - Community
Version: 24.0.2
API version: 1.43
Go version: go1.20.4
Git commit: cb74dfc
Built: Thu May 25 21:51:00 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 24.0.2
API version: 1.43 (minimum version 1.12)
Go version: go1.20.4
Git commit: 659604f
Built: Thu May 25 21:51:00 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
]
$ docker-compose version
[Docker Compose version v2.18.1]
Container logs
$ docker-compose logs
[Elastic Setup
[+] Building 0.0s (0/0)
[+] Running 2/0
\u2714 Container docker-elk-elasticsearch_1-1 Running 0.0s
\u2714 Container docker-elk-setup-1 Created 0.0s
Attaching to docker-elk-setup-1
docker-elk-setup-1 | [+] Waiting for availability of Elasticsearch. This can take several minutes.
docker-elk-setup-1 | \u280d Connection to Elasticsearch failed. Exit code: 35
docker-elk-setup-1 exited with code 35]
I think you're very close. Does it work if you set this in the definition of the setup
service?
env:
ELASTICSEARCH_HOST: elasticsearch_1
edit: my bad, you already did that
I added env to setup service and I still get the code: 35
error.
setup:
profiles:
- setup
build:
context: setup/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
init: true
volumes:
- ./setup/entrypoint.sh:/entrypoint.sh:ro,Z
- ./setup/lib.sh:/lib.sh:ro,Z
- ./setup/roles:/roles:ro,Z
# (!) CA certificate. Generate using the 'tls' service.
- ./tls/certs/ca/ca.crt:/ca.crt:ro,z
- ./elasticsearch/logs/:/usr/share/elasticsearch/logs/
environment:
ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
LOGSTASH_INTERNAL_PASSWORD: ${LOGSTASH_INTERNAL_PASSWORD:-}
KIBANA_SYSTEM_PASSWORD: ${KIBANA_SYSTEM_PASSWORD:-}
METRICBEAT_INTERNAL_PASSWORD: ${METRICBEAT_INTERNAL_PASSWORD:-}
FILEBEAT_INTERNAL_PASSWORD: ${FILEBEAT_INTERNAL_PASSWORD:-}
HEARTBEAT_INTERNAL_PASSWORD: ${HEARTBEAT_INTERNAL_PASSWORD:-}
MONITORING_INTERNAL_PASSWORD: ${MONITORING_INTERNAL_PASSWORD:-}
BEATS_SYSTEM_PASSWORD: ${BEATS_SYSTEM_PASSWORD:-}
ELASTICSEARCH_HOST: elasticsearch_1
networks:
- elk
depends_on:
- elasticsearch_1
For reference here is my modfied lib.sh file.
- I only added a "_1" to all the areas that had just elasticsearch:9200
- When I did that it went from code 6 to code 35 when I do setup
lib.sh
#!/usr/bin/env bash
es_ca_cert="${BASH_SOURCE[0]%/*}"/ca.crt
# Log a message.
function log {
echo "[+] $1"
}
# Log a message at a sub-level.
function sublog {
echo " \u283f $1"
}
# Log an error.
function err {
echo "[x] $1" >&2
}
# Log an error at a sub-level.
function suberr {
echo " \u280d $1" >&2
}
# Poll the 'elasticsearch' service until it responds with HTTP code 200.
function wait_for_elasticsearch {
local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"
local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}' 'https://elasticsearch_1:9200/'
'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
)
if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
fi
local -i result=1
local output
# retry for max 300s (60*5s)
for _ in $(seq 1 60); do
local -i exit_code=0
output="$(curl "${args[@]}")" || exit_code=$?
if ((exit_code)); then
result=$exit_code
fi
if [[ "${output: -3}" -eq 200 ]]; then
result=0
break
fi
sleep 5
done
if ((result)) && [[ "${output: -3}" -ne 000 ]]; then
echo -e "\n${output::-3}"
fi
return $result
}
# Poll the Elasticsearch users API until it returns users.
function wait_for_builtin_users {
local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"
local -a args=( '-s' '-D-' '-m15' 'https://elasticsearch_1:9200/_security/user?pretty'
'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
)
if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
fi
local -i result=1
local line
local -i exit_code
local -i num_users
# retry for max 30s (30*1s)
for _ in $(seq 1 30); do
num_users=0
# read exits with a non-zero code if the last read input doesn't end
# with a newline character. The printf without newline that follows the
# curl command ensures that the final input not only contains curl's
# exit code, but causes read to fail so we can capture the return value.
# Ref. https://unix.stackexchange.com/a/176703/152409
while IFS= read -r line || ! exit_code="$line"; do
if [[ "$line" =~ _reserved.+true ]]; then
(( num_users++ ))
fi
done < <(curl "${args[@]}"; printf '%s' "$?")
if ((exit_code)); then
result=$exit_code
fi
# we expect more than just the 'elastic' user in the result
if (( num_users > 1 )); then
result=0
break
fi
sleep 1
done
return $result
}
# Verify that the given Elasticsearch user exists.
function check_user_exists {
local username=$1
local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"
local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}'
"https://elasticsearch_1:9200/_security/user/${username}"
'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
)
if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
fi
local -i result=1
local -i exists=0
local output
output="$(curl "${args[@]}")"
if [[ "${output: -3}" -eq 200 || "${output: -3}" -eq 404 ]]; then
result=0
fi
if [[ "${output: -3}" -eq 200 ]]; then
exists=1
fi
if ((result)); then
echo -e "\n${output::-3}"
else
echo "$exists"
fi
return $result
}
# Set password of a given Elasticsearch user.
function set_user_password {
local username=$1
local password=$2
local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"
local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}'
"https://elasticsearch_1:9200/_security/user/${username}/_password"
'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
'-X' 'POST'
'-H' 'Content-Type: application/json'
'-d' "{\"password\" : \"${password}\"}"
)
if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
fi
local -i result=1
local output
output="$(curl "${args[@]}")"
if [[ "${output: -3}" -eq 200 ]]; then
result=0
fi
if ((result)); then
echo -e "\n${output::-3}\n"
fi
return $result
}
# Create the given Elasticsearch user.
function create_user {
local username=$1
local password=$2
local role=$3
local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"
local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}'
"https://elasticsearch:9200/_security/user/${username}"
'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
'-X' 'POST'
'-H' 'Content-Type: application/json'
'-d' "{\"password\":\"${password}\",\"roles\":[\"${role}\"]}"
)
if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
fi
local -i result=1
local output
output="$(curl "${args[@]}")"
if [[ "${output: -3}" -eq 200 ]]; then
result=0
fi
if ((result)); then
echo -e "\n${output::-3}\n"
fi
return $result
}
# Ensure that the given Elasticsearch role is up-to-date, create it if required.
function ensure_role {
local name=$1
local body=$2
local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"
local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}'
"https://elasticsearch:9200/_security/role/${name}"
'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
'-X' 'POST'
'-H' 'Content-Type: application/json'
'-d' "$body"
)
if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
fi
local -i result=1
local output
output="$(curl "${args[@]}")"
if [[ "${output: -3}" -eq 200 ]]; then
result=0
fi
if ((result)); then
echo -e "\n${output::-3}\n"
fi
return $result
}
Code 35 is a TLS handshake error.
Each X.509 certificate you generate with up tls
holds a list of hostnames and IP addresses which enumerate what a client should consider valid during the handshake.
To solve your issue, add entries for elasticsearch_1
, elasticsearch_2
, etc. to the block below:
Lines 7 to 10 in 369b682
Then, regenerate the certificates.
You mention that you added the entries already, so maybe re-generating the certificates will be enough.
If not, either
- Elasticsearch wasn't restarted and didn't load the new certificate
- Setup isn't using the same CA certificate as Elasticsearch
Hard to tell because your config looks good to me.
@grizzlycode I got it working in #870. Check it out.
Result of automated tests: CI run 5210817355
I found three issues:
-
Your
setup
depends only onelasticsearch_1
, but Elasticsearch will refuse to serve requests until the cluster is bootstrapped. You need to depend on all three master nodes. -
tls/certs/ca/ca.crt
wasn't mounted inelasticsearch_1
andelasticsearch_2
. -
Java returns an "illegal server name" exception for SNI requests with an underscore in the host name (see below), so I had to rename Elasticsearch services to
elasticsearch01
, etc.
I just updated the wiki page accordingly.caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/172.26.0.4:9200, remoteAddress=/172.26.0.5:44446}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch_1][transport_worker][T#5]","log.logger":"org.elasticsearch.http.AbstractHttpServerTransport","elasticsearch.cluster.uuid":"kKfx7aY9SiWUAP2Lg65t6w","elasticsearch.node.id":"hZfFWqsDTX6MtYL7WOL1Pg","elasticsearch.node.name":"elasticsearch_1","elasticsearch.cluster.name":"docker-cluster","error.type":"io.netty.handler.codec.DecoderException","error.message":"javax.net.ssl.SSLProtocolException: Illegal server name, type=host_name(0), name=elasticsearch_1, value={656C61737469637365617263685F31}","error.stack_trace":"io.netty.handler.codec.DecoderException: javax.net.ssl.SSLProtocolException: Illegal server name, type=host_name(0), name=elasticsearch_1, value={656C61737469637365617263685F31} [...] Caused by: javax.net.ssl.SSLProtocolException: Illegal server name, type=host_name(0), name=elasticsearch_1, value={656C61737469637365617263685F31} [...] Caused by: java.lang.IllegalArgumentException: The encoded server name value is invalid [...] Caused by: java.lang.IllegalArgumentException: Contains non-LDH ASCII characters [...]
More insignificant differences with what you did are:
-
Used the same
elasticsearch.yml
for all Elasticsearch nodes.
Parameters which aren't the same are already injected in env vars, so it's safe to use a common base for all master nodes.
It simplifies things like certificate management, since certs are always mounted in the same location. -
Use named volumes for data instead of bind mounts.
Another suggestion for you:
You could add a forth elasticsearch
service (no number) to your Compose file, assign it the Master-eligible node role, and assign numbered nodes the Data node role.
That way, modifications to the stack can remain light because all connections keep being handled by a node named elasticsearch
, but that node holds no data at all, it only dispatches requests to the data nodes.
- Used the same elasticsearch.yml for all Elasticsearch nodes.
- Parameters which aren't the same are already injected in env vars, so it's safe to use a common base for all master nodes.
- It simplifies things like certificate management, since certs are always mounted in the same location.
The reason I made different elastic configs is when tls
is run it creates certifcates based on the container names in the instances.yml
file and when I ran the setup
elasticearch_01 previously couldn't find the cert because it had a different name than what is hard coded in elasticsearch.yml
and it constantly restarted the container. Also the config file specifically has each certs name in it so I didn't want to share that config incase of conflict since cert names were different. So I created invidual ones for each ES isntance.
So I understand, you are saying stick with this config across all three nodes even though TLS
will create different cert names?
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
I tried using the bind mount above however, it had the same problem the cert generated by TLS is not elasticsearch.crt
which the container is looking for (because the elasticsearch.yml is hard coded to this value), but doens't exist so it fails and constanly restarts. when I change it to the cert name tls
creates it stays up.
- Use named volumes for data instead of bind mounts.
I'm using bind mounts in testing becaue I plan to use a different drive for data. I've used this succesfully with the single ES deployment so I know that works.
For instances.yml
:
If you have multiple DNS entries in there is the first one the one used for name resolution and the others "alt names" for the certificate? Or is resolution dependent on container name only and all provided DNS entries cert "alt names" only?
So I attempted to get around all these resolution issues for setup by just naming elasticsearch01 elasticsearch. Setup now seems to be moving forward but now I get a code 1
error. I'm not sure why names are not being resolved or why setup is erroring out in elasticsarch container logs. However, I noticed the elasticsearch container logs shows it can't resolve the name of the other elasticsearch instances...do all three instances need to connect and create a cluster to continue setup?
Setup logs
[+] Building 0.0s (0/0)
[+] Running 4/3
\u2714 Network docker-elk_default Created 0.1s
\u2714 Network docker-elk_elk Created 0.1s
\u2714 Container docker-elk-elasticsearch-1 Created 0.1s
\u2714 Container docker-elk-setup-1 Created 0.1s
Attaching to docker-elk-setup-1
docker-elk-setup-1 | [+] Waiting for availability of Elasticsearch. This can take several minutes.
docker-elk-setup-1 | \u283f Elasticsearch is running
docker-elk-setup-1 | [+] Waiting for initialization of built-in users
docker-elk-setup-1 | \u283f Built-in users were initialized
docker-elk-setup-1 | [+] Role 'heartbeat_writer'
docker-elk-setup-1 | \u283f Creating/updating
docker-elk-setup-1 |
docker-elk-setup-1 | HTTP/1.1 503 Service Unavailable
docker-elk-setup-1 | X-elastic-product: Elasticsearch
docker-elk-setup-1 | content-type: application/json
docker-elk-setup-1 | content-length: 265
docker-elk-setup-1 |
docker-elk-setup-1 | {"error":{"root_cause":[{"type":"status_exception","reason":"Cluster state has not been recovered yet, cannot write to the [null] index"}],"type":"status_exception","reason":"Cluster state has not been recovered yet, cannot write to the [null] index"},"status":503}
Here is a copy of my instances.yml in case I'm doing something wrong there? However, it seems pretty straight forward.
# This file is used by elasticsearch-certutil to generate X.509 certificates
# for stack components.
#
# Ref. https://www.elastic.co/guide/en/elasticsearch/reference/current/certutil.html#certutil-silent
instances:
- name: elasticsearch
dns:
- elasticsearch # Compose service, resolved by the embedded Docker DNS server name
- es01
- localhost # local connections
- es01.example.lan # Hostname your going to give server
ip:
- 127.0.0.1 # local connections
- ::1
- 1.2.3.4 # Server IP
- name: es02
dns:
- es02 # Compose service, resolved by the embedded Docker DNS server name
- localhost # local connections
- es02.example.lan # Hostname your going to give server
ip:
- 127.0.0.1 # local connections
- ::1
- 1.2.3.4 # Server IP
- name: es03
dns:
- es03 # Compose service, resolved by the embedded Docker DNS server name
- localhost # local connections
- es03.example.lan # Hostname your going to give server
ip:
- 127.0.0.1 # local connections
- ::1
- 1.2.3.4 # Server IP
- name: kibana
dns:
- localhost
- kibana.example.lan
ip:
- 127.0.0.1
- ::1
- 1.2.3.4
- name: fleet-server
dns:
- fleet-server
- localhost
- fs.example.lan
ip:
- 127.0.0.1
- ::1
- 1.2.3.4
- name: apm-server
dns:
- apm-server
- localhost
- apm.example.lan
ip:
- 127.0.0.1
- ::1
- 1.2.3.4
elasticsearch logs
- These "failed to resovle host" warnings are constant throughout the log
- Note I used different names for the other elastic instaces as noted in my instances.yml I provided.
-
- elasticsearch, es02, es03
{"@timestamp":"2023-06-08T17:07:03.477Z", "log.level": "INFO", "message":"using discovery type [multi-node] and seed hosts providers [settings]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.discovery.DiscoveryModule","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:04.884Z", "log.level": "INFO", "message":"initialized", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:04.885Z", "log.level": "INFO", "message":"starting ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:04.913Z", "log.level": "INFO", "message":"persistent cache index loaded", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.xpack.searchablesnapshots.cache.full.PersistentCache","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:04.914Z", "log.level": "INFO", "message":"deprecation component started", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.xpack.deprecation.logging.DeprecationIndexingComponent","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:05.016Z", "log.level": "INFO", "message":"publish_address {192.168.192.2:9300}, bound_addresses {0.0.0.0:9300}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.transport.TransportService","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:05.183Z", "log.level": "INFO", "message":"bound or publishing to a non-loopback address, enforcing bootstrap checks", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.BootstrapChecks","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:05.187Z", "log.level": "INFO", "message":"this node has not joined a bootstrapped cluster yet; [cluster.initial_master_nodes] is set to [elasticsearch, es02, es03]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.cluster.coordination.ClusterBootstrapService","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:20:03.485Z", "log.level": "WARN", "message":"failed to resolve host [es02]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][cluster_coordination][T#1]","log.logger":"org.elasticsearch.discovery.SeedHostsResolver","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster","error.type":"java.net.UnknownHostException","error.message":"es02","error.stack_trace":"java.net.UnknownHostException: es02\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:953)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1673)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1533)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:664)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:606)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:1066)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.discovery.SeedHostsResolver.lambda$resolveHosts$0(SeedHostsResolver.java:92)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1623)\n"}
{"@timestamp":"2023-06-08T17:20:03.486Z", "log.level": "WARN", "message":"failed to resolve host [es03]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][cluster_coordination][T#1]","log.logger":"org.elasticsearch.discovery.SeedHostsResolver","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster","error.type":"java.net.UnknownHostException","error.message":"es03","error.stack_trace":"java.net.UnknownHostException: es03\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:953)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1673)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1533)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:664)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:606)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:1066)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.discovery.SeedHostsResolver.lambda$resolveHosts$0(SeedHostsResolver.java:92)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1623)\n"}
Is there a solution to get these names to resolve?
-
Names don't resolve because the es02 and es03 containers aren't started. See the first point in my previous message.
-
The certificate names have nothing to do with
elasticsearch.yml
. Filees01.crt
can be mounted ases.crt
inside the container, without having to rename files on the host. Therefore it works without duplicating config files, since you can use consistent mount names across containers (again, without having to rename things on the host). -
The host names defined inside
instance.yaml
are propagated to the SAN (alternative names) extension section of the TLS certificates. All the names you put in there will be considered valid from a client perspective (the server itself doesn't care about those names). The label is only used to determine the name of the generated files AFAIK. -
My comment about the bind mounts wasn't a suggestion, I just wanted to point out what I did differently in the example I shared. The difference between the two is insignificant here and your approach is equally fine.
So...I have this working now. Not sure if its the correct way, but I can access Kibana. However, I turned on metricbeat and when I go to stack monitoring
there is no data there. I however, do see metricbeat data in Discover. Is there any config changes required to metricbeat with the additon of the other ES instances?
For reference Compose changes (I didn't change anything else in other configs)
- Added CA and certs to other ES instances as they wouldn't start without that data
-- These aren't included in the code snippet provided on wiki page, but appear to be needed - I had to do a
docker compose up -d
then I did adocker compose up setup
and the setup completed successfully... - Then I brought it all down then back up again to make sure all settings were configured
elasticsearch:
build:
context: elasticsearch/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
volumes:
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
# (!) TLS certificates. Generate using the 'tls' service.
- ./tls/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt:ro,z
- ./tls/certs/elasticsearch/elasticsearch.crt:/usr/share/elasticsearch/config/elasticsearch.crt:ro,z
- ./tls/certs/elasticsearch/elasticsearch.key:/usr/share/elasticsearch/config/elasticsearch.key:ro,z
- ./data/es01data:/usr/share/elasticsearch/data:Z
ports:
- 9200:9200
- 9300:9300
environment:
node.name: elasticsearch
ES_JAVA_OPTS: -Xms512m -Xmx512m
# Bootstrap password.
# Used to initialize the keystore during the initial startup of
# Elasticsearch. Ignored on subsequent runs.
ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
# Use other cluster nodes for unicast discovery
discovery.seed_hosts: es02,es03
# Define initial masters, assuming a cluster size of at least 3
cluster.initial_master_nodes: elasticsearch,es02,es03
networks:
- elk
restart: unless-stopped
es02:
build:
context: elasticsearch/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
volumes:
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
# (!) TLS certificates. Generate using the 'tls' service.
- ./tls/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt:ro,z
- ./tls/certs/elasticsearch/elasticsearch.crt:/usr/share/elasticsearch/config/elasticsearch.crt:ro,z
- ./tls/certs/elasticsearch/elasticsearch.key:/usr/share/elasticsearch/config/elasticsearch.key:ro,z
- ./data/es02data:/usr/share/elasticsearch/data:Z
environment:
ES_JAVA_OPTS: -Xms512m -Xmx512m
# Set a deterministic node name.
node.name: es02
# Use other cluster nodes for unicast discovery.
discovery.seed_hosts: elasticsearch,es03
# Define initial masters, assuming a cluster size of at least 3.
cluster.initial_master_nodes: elasticsearch,es02,es03
networks:
- elk
es03:
build:
context: elasticsearch/
args:
ELASTIC_VERSION: ${ELASTIC_VERSION}
volumes:
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
# (!) TLS certificates. Generate using the 'tls' service.
- ./tls/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt:ro,z
- ./tls/certs/elasticsearch/elasticsearch.crt:/usr/share/elasticsearch/config/elasticsearch.crt:ro,z
- ./tls/certs/elasticsearch/elasticsearch.key:/usr/share/elasticsearch/config/elasticsearch.key:ro,z
- ./data/es03data:/usr/share/elasticsearch/data:Z
environment:
ES_JAVA_OPTS: -Xms512m -Xmx512m
# Set a deterministic node name.
node.name: es03
# Use other cluster nodes for unicast discovery.
discovery.seed_hosts: elasticsearch,es02
# Define initial masters, assuming a cluster size of at least 3.
cluster.initial_master_nodes: elasticsearch,es02,es03
networks:
- elk
Yes, as mentioned in my previous message, mounting the CA certificate in every component is absolutely required. You didn't need to mount the same server certificate and key in every Elasticsearch instance though, you could have continued generating one cert + key pair per ES instance, but glad it works for you.
Those details aren't in the wiki page because the wiki page doesn't cover setups with TLS, but I'm hoping that we will enable TLS by default in v9 when it lands.
In case of doubt, please refer to the PR I shared, it's all there really.
Regarding the monitoring stack, all you need is to enable the users mentioned in the README, then execute the setup again: https://github.com/deviantony/docker-elk/tree/tls/extensions/metricbeat#usage
Lines 24 to 30 in 369b682
Lines 32 to 36 in 369b682
Lines 38 to 42 in 369b682
To monitor the entire cluster instead of just the elasticsearch
node, set scope: cluster
as described at https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html.
Ok, I made the changes to the keys and I updated metricbeat to include the new scope. Everything is functioning correctly now!
Thanks for all the help! Going to close this issue as its now resolved.
Glad I could help!