deviantony/docker-elk

Creating an Elastic cluster with TLS enabled

grizzlycode opened this issue · 12 comments

Problem description

So I'm trying to use your guide to creating an elastic cluster here. However, I don't believe the code snippets plus link to official docker page is enough to get this particular stack working. Are there plans to update this section?

In particular, I've run into the following issues.

  • Changing the container names breaks setup. It is unable to resolve the elasticsearch name now.
    • It appears you need to change the lib.sh file to the new container name elasticsearch_1
    • However, when I do that, I go from code 6, "Unable to resolve name" to code 35, "failure to connect to elasticsearch"
  • I also had to change the instances.yml to include the additonal elasticsearch nodes as well as change/add bind mount and certifcate name changes such as elasticserach_1 otherwise they weren't found

I'm sure after setup is fixed there may be other issues, but I'm currently stuck at setup.

Extra information

  • I did add the vm.max_map_count to /etc/systcl.conf
  • I have each elasticsearch container on its own bind mount as recommended
  • I created another container to troubleshoot and used same network namespace of the setup container and I was able to ping/resolve elasticsearch_1 so not sure why it can't connect to it

Stack configuration

I made the following changes to docker compose

  • Added additional elasticsearch containers per syntax from cluster page
  • Added bind mounts for elasticsearch data to each elastic container
  • Added TLS bind mounts for each elastic container and updated name

I updated intstances.yml

  • Added additional elasticsearch containers to it along with DNS/IP info

I updated lib.sh in setup

  • Updated variable to look for elasticsearch_1 instead of elasticsearch as the name of the container is changed
services:

  tls:
    profiles:
      - setup
    build:
      context: tls/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    user: root  # ensures we can write to the local tls/ directory.
    init: true
    volumes:
      - ./tls/entrypoint.sh:/entrypoint.sh:ro,Z
      - ./tls/instances.yml:/usr/share/elasticsearch/tls/instances.yml:ro,Z
      - ./tls/certs:/usr/share/elasticsearch/tls/certs:z

  setup:
    profiles:
      - setup
    build:
      context: setup/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    init: true
    volumes:
      - ./setup/entrypoint.sh:/entrypoint.sh:ro,Z
      - ./setup/lib.sh:/lib.sh:ro,Z
      - ./setup/roles:/roles:ro,Z
      # (!) CA certificate. Generate using the 'tls' service.
      - ./tls/certs/ca/ca.crt:/ca.crt:ro,z
      - ./elasticsearch/logs/:/usr/share/elasticsearch/logs/
    environment:
      ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
      LOGSTASH_INTERNAL_PASSWORD: ${LOGSTASH_INTERNAL_PASSWORD:-}
      KIBANA_SYSTEM_PASSWORD: ${KIBANA_SYSTEM_PASSWORD:-}
      METRICBEAT_INTERNAL_PASSWORD: ${METRICBEAT_INTERNAL_PASSWORD:-}
      FILEBEAT_INTERNAL_PASSWORD: ${FILEBEAT_INTERNAL_PASSWORD:-}
      HEARTBEAT_INTERNAL_PASSWORD: ${HEARTBEAT_INTERNAL_PASSWORD:-}
      MONITORING_INTERNAL_PASSWORD: ${MONITORING_INTERNAL_PASSWORD:-}
      BEATS_SYSTEM_PASSWORD: ${BEATS_SYSTEM_PASSWORD:-}
    networks:
      - elk
    depends_on:
      - elasticsearch_1

  elasticsearch_1:
    build:
      context: elasticsearch/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    volumes:
      - ./elasticsearch/config/elasticsearch_1.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,Z
      # (!) TLS certificates. Generate using the 'tls' service.
      - ./tls/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt:ro,z
      - ./tls/certs/elasticsearch_1/elasticsearch_1.crt:/usr/share/elasticsearch/config/elasticsearch_1.crt:ro,z
      - ./tls/certs/elasticsearch_1/elasticsearch_1.key:/usr/share/elasticsearch/config/elasticsearch_1.key:ro,z
      - ./data/es01data:/usr/share/elasticsearch/data:Z
    ports:
      - 9200:9200
      - 9300:9300
    environment:
      node.name: elasticsearch_1
      ES_JAVA_OPTS: -Xms512m -Xmx512m
      # Bootstrap password.
      # Used to initialize the keystore during the initial startup of
      # Elasticsearch. Ignored on subsequent runs.
      ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
      # Use other cluster nodes for unicast discovery
      discovery.seed_hosts:  elasticsearch_2,elasticsearch_3
      # Define initial masters, assuming a cluster size of at least 3
      cluster.initial_master_nodes: elasticsearch_1,elasticsearch_2,elasticsearch_3
    networks:
      - elk
    restart: unless-stopped

  elasticsearch_2:
    build:
      context: elasticsearch/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    volumes:
      - ./elasticsearch/config/elasticsearch_2.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
      # (!) TLS certificates. Generate using the 'tls' service.
      - ./tls/certs/elasticsearch_2/elasticsearch_2.crt:/usr/share/elasticsearch/config/elasticsearch_2.crt:ro,z
      - ./tls/certs/elasticsearch_2/elasticsearch_2.key:/usr/share/elasticsearch/config/elasticsearch_2.key:ro,z
      - ./data/es02data:/usr/share/elasticsearch/data:Z
    environment:
      ES_JAVA_OPTS: -Xms512m -Xmx512m
      # Set a deterministic node name.
      node.name: elasticsearch_2
      # Use other cluster nodes for unicast discovery.
      discovery.seed_hosts: elasticsearch_1,elasticsearch_3
      # Define initial masters, assuming a cluster size of at least 3.
      cluster.initial_master_nodes: elasticsearch_1,elasticsearch_2,elasticsearch_3
    networks:
      - elk

  elasticsearch_3:
    build:
      context: elasticsearch/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    volumes:
      - ./elasticsearch/config/elasticsearch_3.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
      # (!) TLS certificates. Generate using the 'tls' service.
      - ./tls/certs/elasticsearch_3/elasticsearch_3.crt:/usr/share/elasticsearch/config/elasticsearch_3.crt:ro,z
      - ./tls/certs/elasticsearch_3/elasticsearch_3.key:/usr/share/elasticsearch/config/elasticsearch_3.key:ro,z
      - ./data/es03data:/usr/share/elasticsearch/data:Z
    environment:
      ES_JAVA_OPTS: -Xms512m -Xmx512m
      # Set a deterministic node name.
      node.name: elasticsearch_3
      # Use other cluster nodes for unicast discovery.
      discovery.seed_hosts: elasticsearch_1,elasticsearch_2
      # Define initial masters, assuming a cluster size of at least 3.
      cluster.initial_master_nodes: elasticsearch_1,elasticsearch_2,elasticsearch_3
    networks:
      - elk

  logstash:
    build:
      context: logstash/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    volumes:
      - ./logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml:ro,Z
      - ./logstash/pipeline:/usr/share/logstash/pipeline:ro,Z
      # (!) CA certificate. Generate using the 'tls' service.
      - ./tls/certs/ca/ca.crt:/usr/share/logstash/config/ca.crt:ro,z
    ports:
      - 5044:5044
      - 50000:50000/tcp
      - 50000:50000/udp
      - 9600:9600
    environment:
      LS_JAVA_OPTS: -Xms256m -Xmx256m
      LOGSTASH_INTERNAL_PASSWORD: ${LOGSTASH_INTERNAL_PASSWORD:-}
    networks:
      - elk
    depends_on:
      - elasticsearch_1
    restart: unless-stopped

  kibana:
    build:
      context: kibana/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    volumes:
      - ./kibana/config/kibana.yml:/usr/share/kibana/config/kibana.yml:ro,Z
      # (!) TLS certificates. Generate using the 'tls' service.
      - ./tls/certs/ca/ca.crt:/usr/share/kibana/config/ca.crt:ro,z
      - ./tls/certs/kibana/kibana.crt:/usr/share/kibana/config/kibana.crt:ro,Z
      - ./tls/certs/kibana/kibana.key:/usr/share/kibana/config/kibana.key:ro,Z
    ports:
      - 5601:5601
    environment:
      KIBANA_SYSTEM_PASSWORD: ${KIBANA_SYSTEM_PASSWORD:-}
    networks:
      - elk
    depends_on:
      - elasticsearch_1
    restart: unless-stopped

networks:
  elk:
    driver: bridge

Docker setup

$ docker version

[Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:51:00 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:51:00 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
]
$ docker-compose version

[Docker Compose version v2.18.1]

Container logs

$ docker-compose logs

[Elastic Setup

[+] Building 0.0s (0/0)                                                                                                                                                                     
[+] Running 2/0
 \u2714 Container docker-elk-elasticsearch_1-1  Running                                                                                                                                     0.0s 
 \u2714 Container docker-elk-setup-1            Created                                                                                                                                     0.0s 
Attaching to docker-elk-setup-1
docker-elk-setup-1  | [+] Waiting for availability of Elasticsearch. This can take several minutes.
docker-elk-setup-1  |    \u280d Connection to Elasticsearch failed. Exit code: 35
docker-elk-setup-1 exited with code 35]

I think you're very close. Does it work if you set this in the definition of the setup service?

env:
  ELASTICSEARCH_HOST: elasticsearch_1

edit: my bad, you already did that

I added env to setup service and I still get the code: 35 error.

setup:
    profiles:
      - setup
    build:
      context: setup/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    init: true
    volumes:
      - ./setup/entrypoint.sh:/entrypoint.sh:ro,Z
      - ./setup/lib.sh:/lib.sh:ro,Z
      - ./setup/roles:/roles:ro,Z
      # (!) CA certificate. Generate using the 'tls' service.
      - ./tls/certs/ca/ca.crt:/ca.crt:ro,z
      - ./elasticsearch/logs/:/usr/share/elasticsearch/logs/
    environment:
      ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
      LOGSTASH_INTERNAL_PASSWORD: ${LOGSTASH_INTERNAL_PASSWORD:-}
      KIBANA_SYSTEM_PASSWORD: ${KIBANA_SYSTEM_PASSWORD:-}
      METRICBEAT_INTERNAL_PASSWORD: ${METRICBEAT_INTERNAL_PASSWORD:-}
      FILEBEAT_INTERNAL_PASSWORD: ${FILEBEAT_INTERNAL_PASSWORD:-}
      HEARTBEAT_INTERNAL_PASSWORD: ${HEARTBEAT_INTERNAL_PASSWORD:-}
      MONITORING_INTERNAL_PASSWORD: ${MONITORING_INTERNAL_PASSWORD:-}
      BEATS_SYSTEM_PASSWORD: ${BEATS_SYSTEM_PASSWORD:-}
      ELASTICSEARCH_HOST: elasticsearch_1
    networks:
      - elk
    depends_on:
      - elasticsearch_1

For reference here is my modfied lib.sh file.

  • I only added a "_1" to all the areas that had just elasticsearch:9200
  • When I did that it went from code 6 to code 35 when I do setup
lib.sh

#!/usr/bin/env bash

es_ca_cert="${BASH_SOURCE[0]%/*}"/ca.crt

# Log a message.
function log {
	echo "[+] $1"
}

# Log a message at a sub-level.
function sublog {
	echo "   \u283f $1"
}

# Log an error.
function err {
	echo "[x] $1" >&2
}

# Log an error at a sub-level.
function suberr {
	echo "   \u280d $1" >&2
}

# Poll the 'elasticsearch' service until it responds with HTTP code 200.
function wait_for_elasticsearch {
	local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"

	local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}' 'https://elasticsearch_1:9200/'
		'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
		)

	if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
		args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
	fi

	local -i result=1
	local output

	# retry for max 300s (60*5s)
	for _ in $(seq 1 60); do
		local -i exit_code=0
		output="$(curl "${args[@]}")" || exit_code=$?

		if ((exit_code)); then
			result=$exit_code
		fi

		if [[ "${output: -3}" -eq 200 ]]; then
			result=0
			break
		fi

		sleep 5
	done

	if ((result)) && [[ "${output: -3}" -ne 000 ]]; then
		echo -e "\n${output::-3}"
	fi

	return $result
}

# Poll the Elasticsearch users API until it returns users.
function wait_for_builtin_users {
	local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"

	local -a args=( '-s' '-D-' '-m15' 'https://elasticsearch_1:9200/_security/user?pretty'
		'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
		)

	if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
		args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
	fi

	local -i result=1

	local line
	local -i exit_code
	local -i num_users

	# retry for max 30s (30*1s)
	for _ in $(seq 1 30); do
		num_users=0

		# read exits with a non-zero code if the last read input doesn't end
		# with a newline character. The printf without newline that follows the
		# curl command ensures that the final input not only contains curl's
		# exit code, but causes read to fail so we can capture the return value.
		# Ref. https://unix.stackexchange.com/a/176703/152409
		while IFS= read -r line || ! exit_code="$line"; do
			if [[ "$line" =~ _reserved.+true ]]; then
				(( num_users++ ))
			fi
		done < <(curl "${args[@]}"; printf '%s' "$?")

		if ((exit_code)); then
			result=$exit_code
		fi

		# we expect more than just the 'elastic' user in the result
		if (( num_users > 1 )); then
			result=0
			break
		fi

		sleep 1
	done

	return $result
}

# Verify that the given Elasticsearch user exists.
function check_user_exists {
	local username=$1

	local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"

	local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}'
		"https://elasticsearch_1:9200/_security/user/${username}"
		'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
		)

	if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
		args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
	fi

	local -i result=1
	local -i exists=0
	local output

	output="$(curl "${args[@]}")"
	if [[ "${output: -3}" -eq 200 || "${output: -3}" -eq 404 ]]; then
		result=0
	fi
	if [[ "${output: -3}" -eq 200 ]]; then
		exists=1
	fi

	if ((result)); then
		echo -e "\n${output::-3}"
	else
		echo "$exists"
	fi

	return $result
}

# Set password of a given Elasticsearch user.
function set_user_password {
	local username=$1
	local password=$2

	local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"

	local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}'
		"https://elasticsearch_1:9200/_security/user/${username}/_password"
		'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
		'-X' 'POST'
		'-H' 'Content-Type: application/json'
		'-d' "{\"password\" : \"${password}\"}"
		)

	if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
		args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
	fi

	local -i result=1
	local output

	output="$(curl "${args[@]}")"
	if [[ "${output: -3}" -eq 200 ]]; then
		result=0
	fi

	if ((result)); then
		echo -e "\n${output::-3}\n"
	fi

	return $result
}

# Create the given Elasticsearch user.
function create_user {
	local username=$1
	local password=$2
	local role=$3

	local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"

	local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}'
		"https://elasticsearch:9200/_security/user/${username}"
		'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
		'-X' 'POST'
		'-H' 'Content-Type: application/json'
		'-d' "{\"password\":\"${password}\",\"roles\":[\"${role}\"]}"
		)

	if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
		args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
	fi

	local -i result=1
	local output

	output="$(curl "${args[@]}")"
	if [[ "${output: -3}" -eq 200 ]]; then
		result=0
	fi

	if ((result)); then
		echo -e "\n${output::-3}\n"
	fi

	return $result
}

# Ensure that the given Elasticsearch role is up-to-date, create it if required.
function ensure_role {
	local name=$1
	local body=$2

	local elasticsearch_host="${ELASTICSEARCH_HOST:-elasticsearch_1}"

	local -a args=( '-s' '-D-' '-m15' '-w' '%{http_code}'
		"https://elasticsearch:9200/_security/role/${name}"
		'--resolve' "elasticsearch_1:9200:${elasticsearch_host}" '--cacert' "$es_ca_cert"
		'-X' 'POST'
		'-H' 'Content-Type: application/json'
		'-d' "$body"
		)

	if [[ -n "${ELASTIC_PASSWORD:-}" ]]; then
		args+=( '-u' "elastic:${ELASTIC_PASSWORD}" )
	fi

	local -i result=1
	local output

	output="$(curl "${args[@]}")"
	if [[ "${output: -3}" -eq 200 ]]; then
		result=0
	fi

	if ((result)); then
		echo -e "\n${output::-3}\n"
	fi

	return $result
}

Code 35 is a TLS handshake error.

Each X.509 certificate you generate with up tls holds a list of hostnames and IP addresses which enumerate what a client should consider valid during the handshake.

To solve your issue, add entries for elasticsearch_1, elasticsearch_2, etc. to the block below:

- name: elasticsearch
dns:
- elasticsearch # Compose service, resolved by the embedded Docker DNS server name
- localhost # local connections

Then, regenerate the certificates.

You mention that you added the entries already, so maybe re-generating the certificates will be enough.
If not, either

  • Elasticsearch wasn't restarted and didn't load the new certificate
  • Setup isn't using the same CA certificate as Elasticsearch

Hard to tell because your config looks good to me.

@grizzlycode I got it working in #870. Check it out.
Result of automated tests: CI run 5210817355

I found three issues:

  • Your setup depends only on elasticsearch_1, but Elasticsearch will refuse to serve requests until the cluster is bootstrapped. You need to depend on all three master nodes.

  • tls/certs/ca/ca.crt wasn't mounted in elasticsearch_1 and elasticsearch_2.

  • Java returns an "illegal server name" exception for SNI requests with an underscore in the host name (see below), so I had to rename Elasticsearch services to elasticsearch01, etc.
    I just updated the wiki page accordingly.

    caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/172.26.0.4:9200, remoteAddress=/172.26.0.5:44446}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch_1][transport_worker][T#5]","log.logger":"org.elasticsearch.http.AbstractHttpServerTransport","elasticsearch.cluster.uuid":"kKfx7aY9SiWUAP2Lg65t6w","elasticsearch.node.id":"hZfFWqsDTX6MtYL7WOL1Pg","elasticsearch.node.name":"elasticsearch_1","elasticsearch.cluster.name":"docker-cluster","error.type":"io.netty.handler.codec.DecoderException","error.message":"javax.net.ssl.SSLProtocolException: Illegal server name, type=host_name(0), name=elasticsearch_1, value={656C61737469637365617263685F31}","error.stack_trace":"io.netty.handler.codec.DecoderException: javax.net.ssl.SSLProtocolException: Illegal server name, type=host_name(0), name=elasticsearch_1, value={656C61737469637365617263685F31}
      [...]
    Caused by: javax.net.ssl.SSLProtocolException: Illegal server name, type=host_name(0), name=elasticsearch_1, value={656C61737469637365617263685F31}
      [...]
    Caused by: java.lang.IllegalArgumentException: The encoded server name value is invalid
      [...]
    Caused by: java.lang.IllegalArgumentException: Contains non-LDH ASCII characters
      [...]
    

More insignificant differences with what you did are:

  • Used the same elasticsearch.yml for all Elasticsearch nodes.
    Parameters which aren't the same are already injected in env vars, so it's safe to use a common base for all master nodes.
    It simplifies things like certificate management, since certs are always mounted in the same location.

  • Use named volumes for data instead of bind mounts.

Another suggestion for you:

You could add a forth elasticsearch service (no number) to your Compose file, assign it the Master-eligible node role, and assign numbered nodes the Data node role.

That way, modifications to the stack can remain light because all connections keep being handled by a node named elasticsearch, but that node holds no data at all, it only dispatches requests to the data nodes.

  • Used the same elasticsearch.yml for all Elasticsearch nodes.
  • Parameters which aren't the same are already injected in env vars, so it's safe to use a common base for all master nodes.
  • It simplifies things like certificate management, since certs are always mounted in the same location.

The reason I made different elastic configs is when tls is run it creates certifcates based on the container names in the instances.yml file and when I ran the setup elasticearch_01 previously couldn't find the cert because it had a different name than what is hard coded in elasticsearch.yml and it constantly restarted the container. Also the config file specifically has each certs name in it so I didn't want to share that config incase of conflict since cert names were different. So I created invidual ones for each ES isntance.

So I understand, you are saying stick with this config across all three nodes even though TLS will create different cert names?

- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z

I tried using the bind mount above however, it had the same problem the cert generated by TLS is not elasticsearch.crt which the container is looking for (because the elasticsearch.yml is hard coded to this value), but doens't exist so it fails and constanly restarts. when I change it to the cert name tls creates it stays up.

  • Use named volumes for data instead of bind mounts.

I'm using bind mounts in testing becaue I plan to use a different drive for data. I've used this succesfully with the single ES deployment so I know that works.

For instances.yml:

If you have multiple DNS entries in there is the first one the one used for name resolution and the others "alt names" for the certificate? Or is resolution dependent on container name only and all provided DNS entries cert "alt names" only?

So I attempted to get around all these resolution issues for setup by just naming elasticsearch01 elasticsearch. Setup now seems to be moving forward but now I get a code 1 error. I'm not sure why names are not being resolved or why setup is erroring out in elasticsarch container logs. However, I noticed the elasticsearch container logs shows it can't resolve the name of the other elasticsearch instances...do all three instances need to connect and create a cluster to continue setup?

Setup logs

[+] Building 0.0s (0/0)                                                                                                                                                                     
[+] Running 4/3
 \u2714 Network docker-elk_default            Created                                                                                                                                       0.1s 
 \u2714 Network docker-elk_elk                Created                                                                                                                                       0.1s 
 \u2714 Container docker-elk-elasticsearch-1  Created                                                                                                                                       0.1s 
 \u2714 Container docker-elk-setup-1          Created                                                                                                                                       0.1s 
Attaching to docker-elk-setup-1
docker-elk-setup-1  | [+] Waiting for availability of Elasticsearch. This can take several minutes.
docker-elk-setup-1  |    \u283f Elasticsearch is running
docker-elk-setup-1  | [+] Waiting for initialization of built-in users
docker-elk-setup-1  |    \u283f Built-in users were initialized
docker-elk-setup-1  | [+] Role 'heartbeat_writer'
docker-elk-setup-1  |    \u283f Creating/updating
docker-elk-setup-1  | 
docker-elk-setup-1  | HTTP/1.1 503 Service Unavailable
docker-elk-setup-1  | X-elastic-product: Elasticsearch
docker-elk-setup-1  | content-type: application/json
docker-elk-setup-1  | content-length: 265
docker-elk-setup-1  | 
docker-elk-setup-1  | {"error":{"root_cause":[{"type":"status_exception","reason":"Cluster state has not been recovered yet, cannot write to the [null] index"}],"type":"status_exception","reason":"Cluster state has not been recovered yet, cannot write to the [null] index"},"status":503}

Here is a copy of my instances.yml in case I'm doing something wrong there? However, it seems pretty straight forward.

# This file is used by elasticsearch-certutil to generate X.509 certificates
# for stack components.
#
# Ref. https://www.elastic.co/guide/en/elasticsearch/reference/current/certutil.html#certutil-silent
instances:

- name: elasticsearch
  dns:
  - elasticsearch  # Compose service, resolved by the embedded Docker DNS server name
  - es01 
  - localhost      # local connections
  - es01.example.lan  # Hostname your going to give server
  ip:
  - 127.0.0.1      # local connections
  - ::1
  - 1.2.3.4       # Server IP

- name: es02
  dns:
  - es02  # Compose service, resolved by the embedded Docker DNS server name
  - localhost      # local connections
  - es02.example.lan  # Hostname your going to give server
  ip:
  - 127.0.0.1      # local connections
  - ::1
  - 1.2.3.4       # Server IP

- name: es03
  dns:
  - es03  # Compose service, resolved by the embedded Docker DNS server name
  - localhost      # local connections
  - es03.example.lan  # Hostname your going to give server
  ip:
  - 127.0.0.1      # local connections
  - ::1
  - 1.2.3.4      # Server IP

- name: kibana
  dns:
  - localhost
  - kibana.example.lan
  ip:
  - 127.0.0.1
  - ::1
  - 1.2.3.4

- name: fleet-server
  dns:
  - fleet-server
  - localhost
  - fs.example.lan
  ip:
  - 127.0.0.1
  - ::1
  - 1.2.3.4

- name: apm-server
  dns:
  - apm-server
  - localhost
  - apm.example.lan
  ip:
  - 127.0.0.1
  - ::1
  - 1.2.3.4

elasticsearch logs

  • These "failed to resovle host" warnings are constant throughout the log
  • Note I used different names for the other elastic instaces as noted in my instances.yml I provided.
    • elasticsearch, es02, es03
{"@timestamp":"2023-06-08T17:07:03.477Z", "log.level": "INFO", "message":"using discovery type [multi-node] and seed hosts providers [settings]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.discovery.DiscoveryModule","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:04.884Z", "log.level": "INFO", "message":"initialized", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:04.885Z", "log.level": "INFO", "message":"starting ...", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.node.Node","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:04.913Z", "log.level": "INFO", "message":"persistent cache index loaded", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.xpack.searchablesnapshots.cache.full.PersistentCache","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:04.914Z", "log.level": "INFO", "message":"deprecation component started", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.xpack.deprecation.logging.DeprecationIndexingComponent","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:05.016Z", "log.level": "INFO", "message":"publish_address {192.168.192.2:9300}, bound_addresses {0.0.0.0:9300}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.transport.TransportService","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:05.183Z", "log.level": "INFO", "message":"bound or publishing to a non-loopback address, enforcing bootstrap checks", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.BootstrapChecks","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:07:05.187Z", "log.level": "INFO", "message":"this node has not joined a bootstrapped cluster yet; [cluster.initial_master_nodes] is set to [elasticsearch, es02, es03]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.cluster.coordination.ClusterBootstrapService","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster"}
{"@timestamp":"2023-06-08T17:20:03.485Z", "log.level": "WARN", "message":"failed to resolve host [es02]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][cluster_coordination][T#1]","log.logger":"org.elasticsearch.discovery.SeedHostsResolver","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster","error.type":"java.net.UnknownHostException","error.message":"es02","error.stack_trace":"java.net.UnknownHostException: es02\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:953)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1673)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1533)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:664)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:606)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:1066)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.discovery.SeedHostsResolver.lambda$resolveHosts$0(SeedHostsResolver.java:92)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1623)\n"}
{"@timestamp":"2023-06-08T17:20:03.486Z", "log.level": "WARN", "message":"failed to resolve host [es03]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[elasticsearch][cluster_coordination][T#1]","log.logger":"org.elasticsearch.discovery.SeedHostsResolver","elasticsearch.node.name":"elasticsearch","elasticsearch.cluster.name":"docker-cluster","error.type":"java.net.UnknownHostException","error.message":"es03","error.stack_trace":"java.net.UnknownHostException: es03\n\tat java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:953)\n\tat java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1673)\n\tat java.base/java.net.InetAddress.getAllByName(InetAddress.java:1533)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:664)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:606)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:1066)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.discovery.SeedHostsResolver.lambda$resolveHosts$0(SeedHostsResolver.java:92)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)\n\tat org.elasticsearch.server@8.8.0/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1623)\n"}

Is there a solution to get these names to resolve?

  • Names don't resolve because the es02 and es03 containers aren't started. See the first point in my previous message.

  • The certificate names have nothing to do with elasticsearch.yml. File es01.crt can be mounted as es.crt inside the container, without having to rename files on the host. Therefore it works without duplicating config files, since you can use consistent mount names across containers (again, without having to rename things on the host).

  • The host names defined inside instance.yaml are propagated to the SAN (alternative names) extension section of the TLS certificates. All the names you put in there will be considered valid from a client perspective (the server itself doesn't care about those names). The label is only used to determine the name of the generated files AFAIK.

  • My comment about the bind mounts wasn't a suggestion, I just wanted to point out what I did differently in the example I shared. The difference between the two is insignificant here and your approach is equally fine.

So...I have this working now. Not sure if its the correct way, but I can access Kibana. However, I turned on metricbeat and when I go to stack monitoring there is no data there. I however, do see metricbeat data in Discover. Is there any config changes required to metricbeat with the additon of the other ES instances?

For reference Compose changes (I didn't change anything else in other configs)

  • Added CA and certs to other ES instances as they wouldn't start without that data
    -- These aren't included in the code snippet provided on wiki page, but appear to be needed
  • I had to do a docker compose up -d then I did a docker compose up setup and the setup completed successfully...
  • Then I brought it all down then back up again to make sure all settings were configured
 elasticsearch:
    build:
      context: elasticsearch/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    volumes:
      - ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
      # (!) TLS certificates. Generate using the 'tls' service.
      - ./tls/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt:ro,z
      - ./tls/certs/elasticsearch/elasticsearch.crt:/usr/share/elasticsearch/config/elasticsearch.crt:ro,z
      - ./tls/certs/elasticsearch/elasticsearch.key:/usr/share/elasticsearch/config/elasticsearch.key:ro,z
      - ./data/es01data:/usr/share/elasticsearch/data:Z
    ports:
      - 9200:9200
      - 9300:9300
    environment:
      node.name: elasticsearch
      ES_JAVA_OPTS: -Xms512m -Xmx512m
      # Bootstrap password.
      # Used to initialize the keystore during the initial startup of
      # Elasticsearch. Ignored on subsequent runs.
      ELASTIC_PASSWORD: ${ELASTIC_PASSWORD:-}
      # Use other cluster nodes for unicast discovery
      discovery.seed_hosts:  es02,es03
      # Define initial masters, assuming a cluster size of at least 3
      cluster.initial_master_nodes: elasticsearch,es02,es03
    networks:
      - elk
    restart: unless-stopped

  es02:
    build:
      context: elasticsearch/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    volumes:
      - ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
      # (!) TLS certificates. Generate using the 'tls' service.
      - ./tls/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt:ro,z
      - ./tls/certs/elasticsearch/elasticsearch.crt:/usr/share/elasticsearch/config/elasticsearch.crt:ro,z
      - ./tls/certs/elasticsearch/elasticsearch.key:/usr/share/elasticsearch/config/elasticsearch.key:ro,z
      - ./data/es02data:/usr/share/elasticsearch/data:Z
    environment:
      ES_JAVA_OPTS: -Xms512m -Xmx512m
      # Set a deterministic node name.
      node.name: es02
      # Use other cluster nodes for unicast discovery.
      discovery.seed_hosts: elasticsearch,es03
      # Define initial masters, assuming a cluster size of at least 3.
      cluster.initial_master_nodes: elasticsearch,es02,es03
    networks:
      - elk

  es03:
    build:
      context: elasticsearch/
      args:
        ELASTIC_VERSION: ${ELASTIC_VERSION}
    volumes:
      - ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml:ro,z
      # (!) TLS certificates. Generate using the 'tls' service.
      - ./tls/certs/ca/ca.crt:/usr/share/elasticsearch/config/ca.crt:ro,z
      - ./tls/certs/elasticsearch/elasticsearch.crt:/usr/share/elasticsearch/config/elasticsearch.crt:ro,z
      - ./tls/certs/elasticsearch/elasticsearch.key:/usr/share/elasticsearch/config/elasticsearch.key:ro,z
      - ./data/es03data:/usr/share/elasticsearch/data:Z
    environment:
      ES_JAVA_OPTS: -Xms512m -Xmx512m
      # Set a deterministic node name.
      node.name: es03
      # Use other cluster nodes for unicast discovery.
      discovery.seed_hosts: elasticsearch,es02
      # Define initial masters, assuming a cluster size of at least 3.
      cluster.initial_master_nodes: elasticsearch,es02,es03
    networks:
      - elk

Yes, as mentioned in my previous message, mounting the CA certificate in every component is absolutely required. You didn't need to mount the same server certificate and key in every Elasticsearch instance though, you could have continued generating one cert + key pair per ES instance, but glad it works for you.

Those details aren't in the wiki page because the wiki page doesn't cover setups with TLS, but I'm hoping that we will enable TLS by default in v9 when it lands.

In case of doubt, please refer to the PR I shared, it's all there really.


Regarding the monitoring stack, all you need is to enable the users mentioned in the README, then execute the setup again: https://github.com/deviantony/docker-elk/tree/tls/extensions/metricbeat#usage

docker-elk/.env

Lines 24 to 30 in 369b682

# Users 'metricbeat_internal', 'filebeat_internal' and 'heartbeat_internal' (custom)
#
# The users Beats use to connect and send data to Elasticsearch.
# https://www.elastic.co/guide/en/beats/metricbeat/current/feature-roles.html
METRICBEAT_INTERNAL_PASSWORD=''
FILEBEAT_INTERNAL_PASSWORD=''
HEARTBEAT_INTERNAL_PASSWORD=''

docker-elk/.env

Lines 32 to 36 in 369b682

# User 'monitoring_internal' (custom)
#
# The user Metricbeat uses to collect monitoring data from stack components.
# https://www.elastic.co/guide/en/elasticsearch/reference/current/how-monitoring-works.html
MONITORING_INTERNAL_PASSWORD=''

docker-elk/.env

Lines 38 to 42 in 369b682

# User 'beats_system' (built-in)
#
# The user the Beats use when storing monitoring information in Elasticsearch.
# https://www.elastic.co/guide/en/elasticsearch/reference/current/built-in-users.html
BEATS_SYSTEM_PASSWORD=''

To monitor the entire cluster instead of just the elasticsearch node, set scope: cluster as described at https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html.

Ok, I made the changes to the keys and I updated metricbeat to include the new scope. Everything is functioning correctly now!

Thanks for all the help! Going to close this issue as its now resolved.

Glad I could help!