100% CPU spin loop forever when `writeAndFlush` gets run from `channelActive`
Closed this issue · 2 comments
weissi commented
NIOSSL has the following loop
bufferedActionsLoop: while bufferedActions.hasMark { // JW: <<<----- loop
let element = bufferedActions.first!
switch element {
case .write(let bufferedWrite):
var data = bufferedWrite.data
let writeSuccessful = try self._encodeSingleWrite(buf: &data)
if writeSuccessful { // JW: <<<----- `writeSuccessful` will be `false`
didWrite = true
if let promise = bufferedWrite.promise { promises.append(promise) }
_ = bufferedActions.removeFirst()
}
case .closeOutput:
[...]
} // JW: <<<----- this will spin around again and again and again
Precisely what happens is this:
- We're in
NIOSSLHandler.channelActive
WebsocketClientHandshakeHander.channelActive
issues awriteAndFlush
- so we enter (in the same frame) into
NIOSSLHandler.flush
- that calls
SSLConnection.writeDataToNetwork
- that tries
SSL_write
(namespaced toCNIOBoringSSL_SSL_write
) - which asks
ssl_can_write
which says false - which triggers the implicit (BoringSSL comment
// If necessary, complete the handshake implicitly.
) handshake throughSSL_do_handshake
which returns-1
(andSSL_ERROR_WANT_READ
) - this makes
writeDataToNetwork
return.incomplete
which makes_encodeSingleWrite
returnfalse
- that makes the above loop loop again, forever
weissi commented
Thanks @glbrntt , this could be a reentrancy bug
public func channelActive(context: ChannelHandlerContext) {
// We fire this a bit early, entirely on purpose. This is because
// in doHandshakeStep we may end up closing the channel again, and
// if we do we want to make sure that the channelInactive message received
// by later channel handlers makes sense.
context.fireChannelActive()
doHandshakeStep(context: context)
}
So we're calling out before calling doHandshakeStep
. This intern then triggers the implicit handshake in BoringSSL. Maybe we think that we'd never trigger the implicit handshake.
Possible that reversing those two lines will fix it because we never do the implicit handshake in BoringSSL.
weissi commented
NIOSSLHandler
sees the following channel events:
handlerAdded
close
(triggered from aChannel.close()
)channelInactive
(triggered fromBaseSocketChannel.close0
) [BUG HERE (channelInactive
withoutchannelActive
)]channelActive
(triggered fromBaseSocketChannel.writable
->BaseSocketChannel.finishConnect
->BaseSocketChannel.becomeActive()
) [BUG HERE (channelActive
afterchannelInactive
)]
Obviously those channel events are wrong. That's a bug in NIO too.