OpenTSDB/asynchbase

OpenTSDB 2.3 not working with Kerberized CDH 5.9

sagonfor7 opened this issue · 5 comments

I am unable to get OpenTSDB 2.3 working against a fully Kerberized CDH 5.9 cluster - specifically, if I set RPC protection to 'privacy' then I see this error (see below). Backing off to 'authentication' makes the error go away, however I can't work with this solution as the data is then not encrypted on the wire, which for us is a requirement. The CDH cluster is behaving normally in all other respects.

Looking at the code it seems like there might be a read/write index snafu of some kind in RegionClient.encode which then causes the problem when the buffer is prepared for wrapping in the helper.

11:38:37.285 INFO [SecureRpcHelper96.handleResponse] - SASL client context established. Negotiated QoP: auth-conf on for: RegionClient@1058994674(chan=null, #pending_rpcs=1, #batched=0, #rpcs_inflight=0)
11:38:37.286 ERROR [RegionClient.exceptionCaught] - Unexpected exception from downstream on [id: 0xc949aed4, /10.0.1.53:39631 => /10.0.1.67:60020]
java.lang.IndexOutOfBoundsException: Not enough readable bytes - Need 191, maximum is 179
at org.jboss.netty.buffer.AbstractChannelBuffer.checkReadableBytes(AbstractChannelBuffer.java:668) ~[netty-3.9.4.Final.jar:na]
at org.jboss.netty.buffer.AbstractChannelBuffer.readBytes(AbstractChannelBuffer.java:338) ~[netty-3.9.4.Final.jar:na]
at org.jboss.netty.buffer.AbstractChannelBuffer.readBytes(AbstractChannelBuffer.java:344) ~[netty-3.9.4.Final.jar:na]
at org.hbase.async.SecureRpcHelper.wrap(SecureRpcHelper.java:235) ~[asynchbase-1.7.2.jar:na]
at org.hbase.async.RegionClient.encode(RegionClient.java:1385) ~[asynchbase-1.7.2.jar:na]

Ok, having looked into it a bit further I'm pretty sure the buffer handling code here is suspect in SecureRpcHelper. We're asking ChannelBuffer to populate a byte array with an array sized by the position of the write index into the ChannelBuffer, but that index bears no real relation to the number of readable bytes which is (writeindex - readindex). Unfortunately for us, netty explicitly checks that it's able to read enough data to fill whatever you give it and throws the above exception otherwise. I think the solution is to use readableBytes() as the array dimension - unless we deliberately want a larger array for reasons I don't understand, in which case you can't use netty's readBytes to do it.

I've patched this on my cluster and now I'm SASLing properly. Happy to do a pull request if you like.

However, I'm now on to the next problem, which several other people have raised, which is about a Broken Pipe being raised later in the process. So no OpenTSDB on secure Hadoop for me yet :-(

Hmm, yeah I think you definitely found the issue in the SecureRpcHelper. If you have a patch please do send it along. The netty replays are a bear to work with.

And do you have some example broken pipes and errors that appear before them? Thanks!

Hey, I am also facing the "Broken Pipe" issue after performing the fix for SecureRpcHelper.
The env I am working on is HDP 2.5.3

The HBase Region Server drops the TSDB connection, and here is what I see in the Region Server logs:-

2017-03-28 15:24:34,332 DEBUG [RpcServer.reader=1,bindAddress=xxxxx.xxxx.xxx.com,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: Caught exception while reading:Problems unwrapping SASL buffer
2017-03-28 15:24:34,332 DEBUG [RpcServer.reader=1,bindAddress=xxxxxxx.xxxx.xxx.com,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: DISCONNECTING client x.x.x.x:58603 because read count=-1. Number of active connections: 1

Hi, I'm trying to run OpenTSDB 2.4.0RC2 with asynchbase 1.8.0 with SecureRPCHelper.java patched as described by @nameeshambardar, and also run into a "Broken pipe" exception as mentioned by @sagonfor7 and @nameeshambardar. The stacktrace is roughly the same as the one shared by @nameeshambardar

Our HBase version is HBase 1.2.0-cdh5.13.0, with Kerberos enabled and hbase.rpc.protection security level set to privacy (auth-conf). Downgrading protection to authentication only is not an option.

Did anyone already find a workaround/fix to use OpenTSDB with a secured HBase? Is this an issue that is already being worked on?

Hi folks, try the "master" branch now please and let me know how this goes for you.