apple/swift-nio-ssl

100% CPU spin loop forever when `writeAndFlush` gets run from `channelActive`

Closed this issue · 2 comments

NIOSSL has the following loop

            bufferedActionsLoop: while bufferedActions.hasMark {         // JW: <<<----- loop
                let element = bufferedActions.first!
                switch element {
                case .write(let bufferedWrite):
                    var data = bufferedWrite.data
                    let writeSuccessful = try self._encodeSingleWrite(buf: &data)
                    if writeSuccessful {      // JW: <<<----- `writeSuccessful` will be `false`
                        didWrite = true
                        if let promise = bufferedWrite.promise { promises.append(promise) }
                        _ = bufferedActions.removeFirst()
                    }
                case .closeOutput:
                    [...]
            }  // JW: <<<----- this will spin around again and again and again

Precisely what happens is this:

  • We're in NIOSSLHandler.channelActive
  • WebsocketClientHandshakeHander.channelActive issues a writeAndFlush
  • so we enter (in the same frame) into NIOSSLHandler.flush
  • that calls SSLConnection.writeDataToNetwork
  • that tries SSL_write (namespaced to CNIOBoringSSL_SSL_write)
  • which asks ssl_can_write which says false
  • which triggers the implicit (BoringSSL comment // If necessary, complete the handshake implicitly.) handshake through SSL_do_handshake which returns -1 (and SSL_ERROR_WANT_READ)
  • this makes writeDataToNetwork return .incomplete which makes _encodeSingleWrite return false
  • that makes the above loop loop again, forever

Thanks @glbrntt , this could be a reentrancy bug

    public func channelActive(context: ChannelHandlerContext) {
        // We fire this a bit early, entirely on purpose. This is because
        // in doHandshakeStep we may end up closing the channel again, and
        // if we do we want to make sure that the channelInactive message received
        // by later channel handlers makes sense.
        context.fireChannelActive()
        doHandshakeStep(context: context)
    }

So we're calling out before calling doHandshakeStep. This intern then triggers the implicit handshake in BoringSSL. Maybe we think that we'd never trigger the implicit handshake.

Possible that reversing those two lines will fix it because we never do the implicit handshake in BoringSSL.

NIOSSLHandler sees the following channel events:

  • handlerAdded
  • close (triggered from a Channel.close())
  • channelInactive (triggered from BaseSocketChannel.close0) [BUG HERE (channelInactive without channelActive)]
  • channelActive (triggered from BaseSocketChannel.writable -> BaseSocketChannel.finishConnect -> BaseSocketChannel.becomeActive()) [BUG HERE (channelActive after channelInactive)]

Obviously those channel events are wrong. That's a bug in NIO too.