Slow certificate creation and SSL handshake causing blocked proxy executor threads.
whitingjr opened this issue · 1 comments
JVM threads block on SSLServerSocket.accept to downstream server when delayed establishing SSL handshake or certificates.
The request processing thread named [pool-n-thread-1] blocks indefinitely every time the SSL certificate creation and handshake takes more than 15 seconds to process. This was discovered during testing. Happening on master branch and tag indy-parent-1.9.9
The ServerSocket and Socket cleanup code is executed only once by the ProxyResponseWriter.doHandleEvent response handler. Exactly 15 seconds after the client initiates an upstream request to the proxy server.
After 15 seconds the event handler on a separate thread named [PROXY-CONNECT repository.jboss.org:443 HTTP/1.1] executes the socket close. But at that time the thread [pool-n-thread-1] is not finished yet doing certificate creation and/or SSL handshake.
The handler event attempts to close ProxyMITMSSLServer.sslServerSocket which is null. Resulting in a NullPointerException and any further cleanup is abandoned. Seen here:
[PROXY-CONNECT repository.jboss.org:443 HTTP/1.1] ERROR o.c.i.h.handler.ProxyResponseWriter - HTTProx request failed: null
java.lang.NullPointerException: null
at org.commonjava.indy.httprox.handler.ProxyMITMSSLServer.stop(ProxyMITMSSLServer.java:328)
at org.commonjava.indy.httprox.handler.ProxyResponseWriter.doHandleEvent(ProxyResponseWriter.java:359)
at org.commonjava.indy.httprox.handler.ProxyResponseWriter.handleEvent(ProxyResponseWriter.java:177)
at org.commonjava.indy.httprox.handler.ProxyResponseWriter.handleEvent(ProxyResponseWriter.java:77)
at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at org.xnio.conduits.WriteReadyHandler$ChannelListenerHandler.writeReady(WriteReadyHandler.java:65)
at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:93)
at org.xnio.nio.WorkerThread.run(WorkerThread.java:539)
[PROXY-CONNECT repository.jboss.org:443 HTTP/1.1] DEBUG o.c.i.h.handler.ProxyResponseWriter - Response error complete.
The handler then returns from the method ProxyMITMSSLServer.stop()
While on a separate thread the proxy server continues processing the request and ProxyMITMSSLServer.execute method continues with the original downstream request. Assigning a ServerSocket object to ProxyMITMSSLServer.sslServerSocket successfully. But then blocks on this line https://github.com/Commonjava/indy/blob/master/addons/httprox/common/src/main/java/org/commonjava/indy/httprox/handler/ProxyMITMSSLServer.java#L202 calling accept method. The method call never returns.
"pool-2-thread-1" #69 prio=5 os_prio=0 tid=0x00007f53485e3000 nid=0x4a9b runnable [0x00007f53e4599000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:560)
at sun.security.ssl.SSLServerSocketImpl.accept(SSLServerSocketImpl.java:348)
at org.commonjava.indy.httprox.handler.ProxyMITMSSLServer.execute(ProxyMITMSSLServer.java:202)
at org.commonjava.indy.httprox.handler.ProxyMITMSSLServer.run(ProxyMITMSSLServer.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The effect on the system is the blocked java thread is causing a single core allocated to the thread to run at 100% for ever. Subsequent requests then block at the same code path but, only one core is pinned at 100%.
ProxyMITMSSLServer never recovers any of the faulty threads named [pool-n-thread-1]. This will occur for each unique host.
top - 21:52:52 up 13:08, 1 user, load average: 2.01, 1.82, 1.50
Tasks: 311 total, 2 running, 308 sleeping, 0 stopped, 1 zombie
%Cpu(s): 7.9 us, 2.3 sy, 0.0 ni, 89.5 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
MiB Mem : 11965.0 total, 2544.6 free, 4856.9 used, 4563.6 buff/cache
MiB Swap: 6024.0 total, 6011.7 free, 12.2 used. 6429.8 avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
90574 whiting+ 20 0 9842684 1.0g 24632 S 102.7 8.8 18:07.76 java
85884 root 20 0 0 0 0 I 7.0 0.0 0:21.31 kworker/u32:1-events_unbound
1458 whiting+ 20 0 460708 200624 176300 S 3.7 1.6 7:14.24 Xorg
5475 whiting+ 20 0 649656 81804 45208 S 3.3 0.7 0:57.78 terminator
1689 whiting+ 20 0 3423308 214076 100380 S 3.0 1.7 3:32.03 gnome-shell
91674 whiting+ 20 0 115720 5576 3200 S 1.7 0.0 0:10.36 htop
1 root 20 0 172028 15464 9696 S 0.7 0.1 0:17.17 systemd
36036 whiting+ 20 0 4210896 788736 428380 S 0.7 6.4 4:58.08 firefox
36322 whiting+ 20 0 2758580 272336 167228 S 0.7 2.2 1:03.64 Web Content
90904 whiting+ 20 0 83.1g 155220 111824 S 0.7 1.3 0:06.73 WebKitWebProces
This issue is most likely to occur when ProxyMITMSSLServer proxy is experiencing external network delays or high load when creating signed certificate for downstream servers.
To recreate this issue:
a) Start the Indy server JVM process with man in the middle proxy enabled. The JVM should be configured to allow a debugger to be attached.
b) Attatch a debugger. Add 2 breakpoints
c) ProxyMITMSSLServer line 175
d) Any j.l.NullPointerException
Then execute a request to the proxy server
e) curl https://repository.jboss.org/nexus/content/groups/public/org/jboss/developer/stacks-client/1.1.0.CR1/stacks-client-1.1.0.CR1.jar --proxy http://localhost:8081 --cacert /home/whitingjr/thebounty/work/redhat/java/indy-jdk8/addons/httprox/ftests/src/main/resources/ssl/ca.crt --output /tmp/download.jar --connect-timeout 60
f) Wait 20 seconds
The debugger will have caught 2 threads at this point. Identified above by the breakpoints.
Allow the caught threads to run. The proxy sever will now demonstrate the cpu core pinned at 100% utilization. Seen in jstack dump and top output.
I suggest this be solved by changing the behavior of the request and response threads.
Currently each thread is stamping on each others fields and making assumptions about progress. Hence the NullPointerException.
Instead I propose to make them both work in co-ordination.
The event response handler can be changed to avoid releasing resources. That responsibility can be nominated to the request thread. To co-ordinate all further request/response processing be cancelled the response handler changes a shared boolean field named "isCancelled".
The request thread will check periodically during processing the status of the shared field. Continuing progress or freeing up any resources.