dotnet/aspnetcore

Improve HTTP/2 performance by using Channels

davidfowl opened this issue · 6 comments

Today Kestrel locks and copies buffers into the connection pipe when multiplexing output from multiple streams. We could improve this by using a channel to queue writes to the pipe instead of locking (See

for an example).

This samples shows the difference between a SemaphoreSlim and a Channel<byte[]> and the performance difference is almost 3x with 0 allocations https://gist.github.com/davidfowl/8af7b6fa21df0fe1f150fb5cfeafa8f7.

There was still lock contention in Channels itself but it was pretty optimized.

We should prototype and measure this change.

cc @JamesNK @halter73 @Tratcher

Thanks for contacting us.
We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

Copied from #30829

The HTTP/2 frame writer uses lock to ensure only one stream can write the connection at a time. For example, Http2FrameWriter.WriteDataAsync.

When a connection has many streams with frequent writes, there is a lot of contention on the lock. Profiling shows high CPU usage from threads fighting over the lock.

A potential improvement would be to change the writer to use a producer/consumer queue using Channel<T>. The streams add write operations to the queue and a single consumer loop is responsible for writing frames to the connection.

Today:

public ValueTask<FlushResult> WriteDataAsync(byte[] data)
{
    lock (_writeLock)
    {
        _writer.Write(data);
        return _writer.FlushAsync()
    }
}

Future:

public ValueTask WriteDataAsync(byte[] data)
{
    var operation = SetupWriteData(data);
    _channelWriter.TryWrite(operation);
    return operation.WaitForCompletedAsync();
}

// Consumer loop that reads operations from channel
private async ValueTask WriterConsumer()
{
    while (true)
    {
        if (!await _channelReader.WaitToReadAsync())
        {
            return;
        }

        if (_channelReader.TryRead(out T item))
        {
            switch (item.OperationType)
            {
                case OperationType.WriteData:
                    _writer.Write(item.Data);
                    await _writer.FlushAsync();
                    item.Complete();
                    break;
                case OperationType.WriteHeaders:
                    // etc..
                    break;
            }
        }
    }
}

Note: The HTTP/2 FlowControl logic relies on being synchronized by the _writeLock.

public bool TryUpdateConnectionWindow(int bytes)
{
lock (_writeLock)
{
return _connectionOutputFlowControl.TryUpdateWindow(bytes);
}
}
public bool TryUpdateStreamWindow(StreamOutputFlowControl flowControl, int bytes)
{
lock (_writeLock)
{
return flowControl.TryUpdateWindow(bytes);
}
}

We can instead move this logic into WriterConsumer() for synchronization, but that means TryUpdateStreamWindow will need to be an operation that can be written to the channel. That also means what looks like a single write to WriteDataAsync() could take several iterations of the WriterConsumer() loop to complete as the writes wait for flow control availability.

I don't think this causes a real problem for this approach. It's just a slight complication. I really like using Channels for this.

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

Evidence of this issue from gRPC benchmarks. They show:

  • HTTP/3 is fast.
  • H2C is slow.
  • H2 (HTTP/2 + TLS) is even worse. I'm guessing it causes more time to be spent inside the lock and further increases thread contention. CPU usage on the server and client are both low.

image

(h2 RPS is the yellow dot at the bottom)

Closing this as fixed.