Could not find log4j
brooksgarrett opened this issue · 2 comments
On startup exhibitor fails to start zookeeper with the following error:
INFO: Initiating Jersey application, version 'Jersey: 1.18.3 12/01/2014 08:23 AM'
INFO org.mortbay.log Started SocketConnector@0.0.0.0:8080 [main]
INFO com.netflix.exhibitor.core.activity.ActivityLog State: down [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Restart of ZooKeeper skipped due to control panel setting [ActivityQueue-0]
INFO com.netflix.exhibitor.core.activity.ActivityLog Attempting to start instance [ActivityQueue-0]
ERROR com.netflix.exhibitor.core.activity.ActivityLog Trying to kill start instance [ActivityQueue-0]
java.io.IOException: Could not find (.*log4j.*)|(.*slf4j.*) jar
at com.netflix.exhibitor.core.processes.Details.findJar(Details.java:145)
at com.netflix.exhibitor.core.processes.Details.<init>(Details.java:57)
at com.netflix.exhibitor.core.processes.StandardProcessOperations.startInstance(StandardProcessOperations.java:105)
at com.netflix.exhibitor.core.state.StartInstance.call(StartInstance.java:46)
at com.netflix.exhibitor.core.state.StartInstance.call(StartInstance.java:23)
at com.netflix.exhibitor.core.activity.ActivityQueue$1.run(ActivityQueue.java:126)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
INFO com.netflix.exhibitor.core.activity.ActivityLog ZooKeeper down/not-serving waiting 30161 of 40000 ms before restarting [ActivityQueue-0]
I'm using shared configuration backed by S3 and my defaults looks like so:
defaults.conf
zookeeper-install-directory=/opt/zookeeper
zookeeper-data-directory=/var/lib/zookeeper/data
zookeeper-log-directory=/var/lib/zookeeper/datalog
zookeeper-log-directory=/var/lib/zookeeper/datalog
log-index-directory=/var/lib/zookeeper/datalog
client-port=2181
connect-port=2888
election-port=3888
zoo-cfg-extra=tickTime\=2000&initLimit\=10&syncLimit\=5&quorumListenOnAllIPs\=true
auto-manage-instances=true
I've checked 100 times and the log4j jars are both in the zookeeper lib directory as well as symlinked in the exhibitor install. I've tried 1.5.5 as well as 1.6.0. The build is via maven against master (as well as last stable release tag). I'm truly at wits end here, has anyone seen this behavior and have any idea of where to start digging?
Hi from looking at the code, Exhibitor makes the assumption that the log4j jar files are located in a lib
folder under the Zookeeper install path. In your case that would be /opt/zookeeper/lib
.
Can you also make sure that the user which launches Exhibitor has read permission on that directory and its contents?
For example, this is what I see for a CDH distribution of Zookeeper:
~$ ll /usr/lib/zookeeper/lib/
total 2016
-rw-r--r-- 1 root root 208781 Aug 24 16:35 jline-2.11.jar
-rw-r--r-- 1 root root 481535 Aug 24 16:35 log4j-1.2.16.jar
-rw-r--r-- 1 root root 1330394 Aug 24 16:35 netty-3.10.5.Final.jar
-rw-r--r-- 1 root root 26084 Aug 24 16:35 slf4j-api-1.7.5.jar
-rw-r--r-- 1 root root 8869 Aug 24 16:35 slf4j-log4j12-1.7.5.jar
lrwxrwxrwx 1 root root 23 Oct 5 17:21 slf4j-log4j12.jar -> slf4j-log4j12-1.7.5.jar
~$ ll /usr/lib/zookeeper/
total 1408
drwxr-xr-x 2 root root 4096 Oct 5 17:21 bin
drwxr-xr-x 2 root root 4096 Oct 5 17:21 cloudera
lrwxrwxrwx 1 root root 19 Oct 5 17:21 conf -> /etc/zookeeper/conf
drwxr-xr-x 2 root root 4096 Oct 5 17:21 lib
-rw-r--r-- 1 root root 11358 Aug 24 16:35 LICENSE.txt
-rw-r--r-- 1 root root 170 Aug 24 16:35 NOTICE.txt
-rw-r--r-- 1 root root 1410862 Aug 24 16:35 zookeeper-3.4.5-cdh5.12.1.jar
lrwxrwxrwx 1 root root 29 Oct 5 17:21 zookeeper.jar -> zookeeper-3.4.5-cdh5.12.1.jar
In version 1.7.0, I also made it print the absolute path for that exception (dir.getAbsolutePath()
). It may help if you try it with 1.7.0 as well.
I actually traced this down to an issue where the shared state on S3 was corrupt and the defaults weren't being used so the path was null. The extra logging would have spotted the issue much faster.
I actually tried using 1.7.0 to get the logging you pointed out and oddly my version string still reads 1.6.0. Since I resolved the error I've haven't looked into the issue with 1.7.0 further. I'll close this issue.