Master node errors
Opened this issue · 11 comments
The issue is not with my environment as I'm able to build cluster with different configs. Wondering if there is issue in new 6.3.0 image?
@pires I believe 6.3.0 is broken. ES appears to be attempting to run /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller
on startup, and fails. It is running this despite xpack.ml.enabled
being false.
Running an ldd on this binary shows that there are issues with missing shared libraries and missing symbols (alpine?):
bash-4.4# ldd /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller
/lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
libdl.so.2 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
librt.so.1 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
liblog4cxx.so.10 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/liblog4cxx.so.10 (0x7f6e4de7c000)
libboost_program_options-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libboost_program_options-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4dbfb000)
libMlCore.so => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so (0x7f6e4d8ca000)
libstdc++.so.6 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libstdc++.so.6 (0x7f6e4d520000)
libm.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
libgcc_s.so.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libgcc_s.so.1 (0x7f6e4d309000)
libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7f6e4e478000)
libaprutil-1.so.0 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libaprutil-1.so.0 (0x7f6e4d0e2000)
libexpat.so.0 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libexpat.so.0 (0x7f6e4ceb7000)
libapr-1.so.0 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0 (0x7f6e4cc81000)
Error loading shared library libcrypt.so.1: No such file or directory (needed by /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/liblog4cxx.so.10)
libxml2.so.2 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libxml2.so.2 (0x7f6e4c907000)
libz.so.1 => /lib/libz.so.1 (0x7f6e4c6f0000)
libboost_regex-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libboost_regex-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4c3f8000)
libboost_iostreams-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libboost_iostreams-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4c1e3000)
libboost_filesystem-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libboost_filesystem-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4bfc8000)
libboost_system-gcc62-mt-1_65_1.so.1.65.1 => /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libboost_system-gcc62-mt-1_65_1.so.1.65.1 (0x7f6e4bdc4000)
Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libstdc++.so.6)
Error loading shared library libcrypt.so.1: No such file or directory (needed by /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libaprutil-1.so.0)
Error loading shared library libcrypt.so.1: No such file or directory (needed by /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0)
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __open_2: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __sprintf_chk: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __vsnprintf_chk: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __memmove_chk: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlCore.so: __fprintf_chk: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libgcc_s.so.1: __cpu_indicator_init: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/libgcc_s.so.1: __cpu_model: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: pthread_mutex_consistent_np: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: __rawmemchr: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: pthread_mutexattr_setrobust_np: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: __isnan: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: __isinf: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: pthread_yield: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libapr-1.so.0: sys_siglist: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libxml2.so.2: __isnan: symbol not found
Error relocating /elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/../lib/./libxml2.so.2: __isinf: symbol not found
The relevant ES code seems to be here:
It appears to attempt to spawn native controllers based on whether they exist in the filesystem. Adding the following to run.sh
appears to fix it because it silently ignores modules for which the native controller doesn't exist:
# Workaround x-pack ML incompatibility
echo "Deleting x-pack-ml platform files to prevent native controller spawning..."
rm -rf $BASE/modules/x-pack/x-pack-ml/platform/linux-x86_64
# Probably not necessary but lets delete the plugin native libs for good measure
rm -rf $BASE/plugins/x-pack/x-pack-ml/platform/linux-x86_64
if running on k8s, how can run.sh be modified? Is quay.io/pires/elasticsearch:6.2.4 image still compatible?
@rocketraman I think you may have run one image that was pushed erroneously. Can you make sure you are downloading the current 6.3.0 tag? I just brought a new Kubernetes cluster up and was able to run everything as expected.
@pires I believe I have the latest image. Here is the docker history:
$ docker history quay.io/pires/docker-elasticsearch-kubernetes:6.3.0 | head -5
IMAGE CREATED CREATED BY SIZE COMMENT
3813d73cfe49 2 days ago /bin/sh -c #(nop) ENV MEMORY_LOCK=false 0 B
<missing> 2 days ago /bin/sh -c #(nop) ENV DISCOVERY_SERVICE=e... 0 B
<missing> 2 days ago /bin/sh -c #(nop) ADD dir:a37b50c691132deb... 904 B
<missing> 2 days ago /bin/sh -c #(nop) MAINTAINER pjpires@gmai... 0 B
And this is the error I get on startup, without the workaround I posted above, which causes the image to crash loop:
[2018-06-17T03:58:42,113][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-master-0] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: org.elasticsearch.bootstrap.BootstrapException: java.io.IOException: Cannot run program "/elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller": error=2, No such file or directory
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.3.0.jar:6.3.0]
at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.3.0.jar:6.3.0]
Caused by: org.elasticsearch.bootstrap.BootstrapException: java.io.IOException: Cannot run program "/elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller": error=2, No such file or directory
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:168) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.0.jar:6.3.0]
... 6 more
Caused by: java.io.IOException: Cannot run program "/elasticsearch/modules/x-pack/x-pack-ml/platform/linux-x86_64/bin/controller": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) ~[?:1.8.0_151]
at org.elasticsearch.bootstrap.Spawner.spawnNativeController(Spawner.java:118) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Spawner.spawnNativeControllers(Spawner.java:86) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:166) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.0.jar:6.3.0]
... 6 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method) ~[?:1.8.0_151]
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) ~[?:1.8.0_151]
at java.lang.ProcessImpl.start(ProcessImpl.java:134) ~[?:1.8.0_151]
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ~[?:1.8.0_151]
at org.elasticsearch.bootstrap.Spawner.spawnNativeController(Spawner.java:118) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Spawner.spawnNativeControllers(Spawner.java:86) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:166) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.3.0.jar:6.3.0]
... 6 more
@rewt If you're getting the same error as me, you can build an image that extends quay.io/pires/docker-elasticsearch-kubernetes:6.3.0
, with a modified run.sh
(you can grab the upstream version from here). Your Dockerfile might look something like this:
FROM quay.io/pires/docker-elasticsearch-kubernetes:6.3.0
# Copy your modified run.sh into the image
COPY run.sh /
and build / push it:
docker build -t myregistry/myorg/elasticsearch:latest -t myregistry/myorg/elasticsearch:6.3.0 .
docker push myregistry/myorg/elasticsearch:latest
docker push myregistry/myorg/elasticsearch:6.3.0
The right tag is b16d5e2a8db4
.
Please, delete the image from your system and pull again.
@pires It's the right one - the sha256 digest starts with b16d5e2a8db4
:
$ docker pull quay.io/pires/docker-elasticsearch-kubernetes:6.3.0
Trying to pull repository quay.io/pires/docker-elasticsearch-kubernetes ...
sha256:b16d5e2a8db4c5d969c3068ef5c60f9921c25566c39063a2994a0beaa6865cb1: Pulling from quay.io/pires/docker-elasticsearch-kubernetes
ff3a5c916c92: Already exists
2636de92c26b: Already exists
ff8e864950b6: Already exists
f30ad320ffb8: Already exists
6f564cc2a8e4: Already exists
2b9a9ed5e7b7: Already exists
f12083bb7793: Already exists
Digest: sha256:b16d5e2a8db4c5d969c3068ef5c60f9921c25566c39063a2994a0beaa6865cb1
Status: Image is up to date for quay.io/pires/docker-elasticsearch-kubernetes:6.3.0
Digging deeper into errors, I found my issue documented here #64
Adding NETWORK_HOST
value _eth0_
to es-master.yml resolved issue as nodes were not obtaining connectivity.
@rewt Thanks, I'll create another issue for what I was seeing as it looks to be a different problem.