pires/kubernetes-elasticsearch-cluster

Master node errors

Opened this issue · 11 comments

rewt commented

screen shot 2018-06-15 at 4 16 55 pm

rewt commented

The issue is not with my environment as I'm able to build cluster with different configs. Wondering if there is issue in new 6.3.0 image?

@pires I believe 6.3.0 is broken. ES appears to be attempting to run /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller on startup, and fails. It is running this despite xpack.ml.enabled being false.

Running an ldd on this binary shows that there are issues with missing shared libraries and missing symbols (alpine?):

bash-4.4# ldd /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller
        /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
        libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
        libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
        librt.so.1 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
        liblog4cxx.so.10 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/liblog4cxx.so.10 (0x7f6e4de7c000)
        libboost_program_options-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libboost_program_options-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4dbfb000)
        libMlCore.so => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so (0x7f6e4d8ca000)
        libstdc++.so.6 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libstdc++.so.6 (0x7f6e4d520000)
        libm.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
        libgcc_s.so.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libgcc_s.so.1 (0x7f6e4d309000)
        libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
        libaprutil-1.so.0 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libaprutil-1.so.0 (0x7f6e4d0e2000)
        libexpat.so.0 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libexpat.so.0 (0x7f6e4ceb7000)
        libapr-1.so.0 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0 (0x7f6e4cc81000)
Error loading shared library libcrypt.so.1: No such file or directory (needed by /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/liblog4cxx.so.10)
        libxml2.so.2 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libxml2.so.2 (0x7f6e4c907000)
        libz.so.1 => /lib/libz.so.1 (0x7f6e4c6f0000)
        libboost_regex-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libboost_regex-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4c3f8000)
        libboost_iostreams-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libboost_iostreams-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4c1e3000)
        libboost_filesystem-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libboost_filesystem-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4bfc8000)
        libboost_system-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libboost_system-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4bdc4000)
Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libstdc++.so.6)
Error loading shared library libcrypt.so.1: No such file or directory (needed by /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libaprutil-1.so.0)
Error loading shared library libcrypt.so.1: No such file or directory (needed by /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0)
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __open_2: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __sprintf_chk: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __vsnprintf_chk: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __memmove_chk: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __fprintf_chk: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libgcc_s.so.1: __cpu_indicator_init: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libgcc_s.so.1: __cpu_model: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: pthread_mutex_consistent_np: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: __rawmemchr: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: pthread_mutexattr_setrobust_np: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: __isnan: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: __isinf: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: pthread_yield: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: sys_siglist: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libxml2.so.2: __isnan: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libxml2.so.2: __isinf: symbol not found

The relevant ES code seems to be here:

https://github.com/elastic/elasticsearch/blob/72f57c8e72500cb27dc69e902edf5cd270249c14/server/src/main/java/org/elasticsearch/bootstrap/Spawner.java#L61

It appears to attempt to spawn native controllers based on whether they exist in the filesystem. Adding the following to run.sh appears to fix it because it silently ignores modules for which the native controller doesn't exist:

# Workaround x-pack ML incompatibility
echo "Deleting x-pack-ml platform files to prevent native controller spawning..."
rm -rf $BASE/modules/x-pack/x-pack-ml/platform/linux-x86_64
# Probably not necessary but lets delete the plugin native libs for good measure
rm -rf $BASE/plugins/x-pack/x-pack-ml/platform/linux-x86_64
rewt commented

if running on k8s, how can run.sh be modified? Is quay.io/pires/elasticsearch:6.2.4 image still compatible?

pires commented

@rocketraman I think you may have run one image that was pushed erroneously. Can you make sure you are downloading the current 6.3.0 tag? I just brought a new Kubernetes cluster up and was able to run everything as expected.

@pires I believe I have the latest image. Here is the docker history:

$ docker history quay.io/pires/docker-elasticsearch-kubernetes:6.3.0 | head -5
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
3813d73cfe49        2 days ago          /bin/sh -c #(nop)  ENV MEMORY_LOCK=false        0 B                 
<missing>           2 days ago          /bin/sh -c #(nop)  ENV DISCOVERY_SERVICE=e...   0 B                 
<missing>           2 days ago          /bin/sh -c #(nop) ADD dir:a37b50c691132deb...   904 B               
<missing>           2 days ago          /bin/sh -c #(nop)  MAINTAINER pjpires@gmai...   0 B

And this is the error I get on startup, without the workaround I posted above, which causes the image to crash loop:

[2018-06-17T03:58:42,113][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-master-0] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: org.elasticsearch.bootstrap.BootstrapException: java.io.IOException: Cannot run program "/elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller": error=2, No such file or directory
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.3.0.jar:6.3.0]
	at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.3.0.jar:6.3.0]
Caused by: org.elasticsearch.bootstrap.BootstrapException: java.io.IOException: Cannot run program "/elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller": error=2, No such file or directory
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:168) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.0.jar:6.3.0]
	... 6 more
Caused by: java.io.IOException: Cannot run program "/elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller": error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[?:1.8.0_151]
	at org.elasticsearch.bootstrap.Spawner.spawnNativeController(Spawner.java:118) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Spawner.spawnNativeControllers(Spawner.java:86) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:166) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.0.jar:6.3.0]
	... 6 more
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method) ~[?:1.8.0_151]
	at java.lang.UNIXProcess.&lt;init&gt;(UNIXProcess.java:247) ~[?:1.8.0_151]
	at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[?:1.8.0_151]
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[?:1.8.0_151]
	at org.elasticsearch.bootstrap.Spawner.spawnNativeController(Spawner.java:118) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Spawner.spawnNativeControllers(Spawner.java:86) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:166) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.0.jar:6.3.0]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.0.jar:6.3.0]
	... 6 more

@rewt If you're getting the same error as me, you can build an image that extends quay.io/pires/docker-elasticsearch-kubernetes:6.3.0, with a modified run.sh (you can grab the upstream version from here). Your Dockerfile might look something like this:

FROM quay.io/pires/docker-elasticsearch-kubernetes:6.3.0
# Copy your modified run.sh into the image
COPY run.sh /

and build / push it:

docker build -t myregistry/myorg/elasticsearch:latest -t myregistry/myorg/elasticsearch:6.3.0 .
docker push myregistry/myorg/elasticsearch:latest
docker push myregistry/myorg/elasticsearch:6.3.0
pires commented

The right tag is b16d5e2a8db4.

pires commented

Please, delete the image from your system and pull again.

@pires It's the right one - the sha256 digest starts with b16d5e2a8db4:

$ docker pull quay.io/pires/docker-elasticsearch-kubernetes:6.3.0
Trying to pull repository quay.io/pires/docker-elasticsearch-kubernetes ...                                                                                                                                                        
sha256:b16d5e2a8db4c5d969c3068ef5c60f9921c25566c39063a2994a0beaa6865cb1: Pulling from quay.io/pires/docker-elasticsearch-kubernetes
ff3a5c916c92: Already exists 
2636de92c26b: Already exists 
ff8e864950b6: Already exists 
f30ad320ffb8: Already exists 
6f564cc2a8e4: Already exists 
2b9a9ed5e7b7: Already exists 
f12083bb7793: Already exists 
Digest: sha256:b16d5e2a8db4c5d969c3068ef5c60f9921c25566c39063a2994a0beaa6865cb1
Status: Image is up to date for quay.io/pires/docker-elasticsearch-kubernetes:6.3.0
rewt commented

Digging deeper into errors, I found my issue documented here #64

Adding NETWORK_HOST value _eth0_ to es-master.yml resolved issue as nodes were not obtaining connectivity.

@rewt Thanks, I'll create another issue for what I was seeing as it looks to be a different problem.