LDAP server connection leak causing spurious high CPU consumption.
Closed this issue · 1 comments
I am running Azkaban 3.0 with azkaban-ldap-usermanager. I had to bounce the Azkaban web server once it consumes all the CPU available on the box and slows everything down. There is a bug that causes gradual increase in CPU consumption over time. I enabled the JMX on the Azkaban web server, connected jvisualvm and was found out the following:
Over time the number of NioProcessor threads grows from 0 to about 10. These threads are constantly consuming CPU, eventually consuming all of it and requiring a restart of the Azkaban Web server.
The thread dump for these threads looks something like this:
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.IOUtil.drain(Native Method)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:90)
locked <3e5de8a8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
locked <1b60bc70> (a sun.nio.ch.Util$2)
locked <3c55e39e> (a java.util.Collections$UnmodifiableSet)
locked <1f30b1fa> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:97)
at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:1074)
at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
As people log into Azkaban the NioProcessor threads are created, but in the current implementation they never go away. This plugin uses org.apache.directory which in turn uses org.apache.mina.
It appears that LDAP user manager plugin opens a LDAP connection inside a try block that has no matching finally block. The block is question is NOT a "try with resources" block, so even though the LdapConnection is AutoCloseable it doesn't do us any good in this case. Essentially, every time an exception happens in getUser() or validateUser() methods a LDAP connection is leaked. This probably explains why the machine burns too much CPU running org.apache.mina code that is invoked by LDAP user manager via org.apache.directory.
The abandon() calls are described in the LDAP documentation as a "courtesy". I removed this call, because in my testing trying to close the connection after calling abandon() raised an exception which was not handled, this triggering the connection leak even on successful logins.