`DiscordGatewayClient` allocation and method improvements
VelvetToroyashi opened this issue · 8 comments
Description
As it stands right now, the gateway client is not only the single biggest, single-purpose class* in the library, but it is also the top allocator as shown by a profiler. It's unlikely that these allocations pose a memory leak, but it is partially unnecessary when it's possible to reduce these allocations.
Allocations were initially my main goal when it came to sprucing up the client, as described below, but as I continued to look for room for improvement I realized just how much of a mess the client really is (and, I don't fault Jax here, as the client is somewhat non-trivial in its operation).
However, there is a lot of code that could be simplified by being broken out, as I'll describe later in this issue.
* This is excluding ____API
classes, which I would personally consider "multi-purpose".
Why This is Needed
As described above, the gateway client is responsible for the top five allocations across the entire span of the application lifetime, even in comparison to EF Core and the available Redis caching, which marshals objects when deserializing.
The screenshot below is the measurement over a six-hour period, and while the overall allocations are rather trivial for most, and for the most part will be GC'd, these can still be potentially reduced by trivial means.
Alternatives Considered
Short of rewriting the entire client (e.g. #176), there aren't many alternatives to be had here.
Additional Details
Here is the meat of the document, and I'll point out everything that I think can be reasonably fixed.
Passing around heartbeat_interval
Call this a nitpick, but I feel like this issue is just large enough to be somewhat of an issue; not necessarily for maintainability or functionality, but for ease of writing code. There's also potential defensive copying at play here, though that's a bit of a micro-optimization.
GatewaySenderAsync
issues
This is a bundle of issues, but I'll put it under one main section
Heartbeating should be its own loop
As shown below, the gateway prioritizes heartbeats over user-submitted payloads which is fine, but I think this can be slightly improved by adjusting how payload rate-limiting is accounted for.
Instead of having heartbeating and user payloads both be rate-limiting, we could instead reserve a single slot (heartbeating is ~45s, and it's 120 events/min, so there's only ever 1 heartbeat per ratelimit window).
That single slot ensures that heartbeats are always sent (and even if they aren't, this is technically OK because Discord doesn't immediately disconnect you from the socket). The only issue would be that the client automatically disconnects if the expected OP11 (HEARTBEAT_ACK
) isn't received in time, but otherwise, this is a fine solution.
As for prioritization, a heartbeatLock
field could be introduced; it'd be something along the lines of a SemaphoreSlim
or even a ReadWriteLockSlim
, and sending payloads waits on hearbeating to complete for consistency.
I'm pretty sure @carlst99 (Falcon) addressed this issue in the common gateway PR, but I'm not entirely certain.
Remora.Discord/Backend/Remora.Discord.Gateway/DiscordGatewayClient.cs
Lines 924 to 963 in 7935da1
Sending payloads shouldn't rely on sleep time
In addendum to the above, this code only exists because of the above and can be safely fixed with a concrete delay. For reasons I'll get into below, using Task.Delay
isn't the best choice for this either.
Remora.Discord/Backend/Remora.Discord.Gateway/DiscordGatewayClient.cs
Lines 967 to 972 in 7935da1
Remora.Discord/Backend/Remora.Discord.Gateway/DiscordGatewayClient.cs
Lines 990 to 1001 in 7935da1
Edit: November 29th, 2022:
Another solution would be omitting the delay entirely, and looking into a different solution such as a Channel<T>
for receiving payloads. One of Falcon's improvements upon the gateway client in his aforementioned PR was using a channel for this purpose, however.
If heartbeating was seperated entirely, then instead of using the synchronous, and quick-returning TryRead
, we could use WaitToReadAsync
.
From there, we'd make a for loop that looks something along the lines of:
var canRead = await _payloadQueue.WaitForReadAsync(ct);
if (!canRead)
{
return;
}
// The ordering of the loop doesn't matter here; I just think a reverse loop makes more sense for a queue
for (int i = _maxAllotableSlots; i < 0; i--)
{
if (!_payloadQueue.TryRead(out var payload)
{
break;
}
_transportService.SendPayloadAsync(payload, ct);
}
Stop using Task.Delay
(Please)
This isn't a blanket statement; it's a suggestion to use the alternative, given it's appropriate for the context.
With Remora targeting .NET 6/7 now, we're given access to the new APIs introduced in those versions, which allow for simplifying code and reducing allocations with just a few lines of code being changed.
PeriodicTimer
is a new timer class introduced in .NET 6, and is an alternative to previous timer APIs which were unideal for one reason or another. PeriodicTimer
's WaitForNextTickAsync
method returns a ValueTask<bool>
, which is already rather ideal.
Its return is indicative of whether it will continue to tick, so much like a ValueTask<Result>
, gateway termination can be gracefully handled. Unlike Task.Delay
, cancelling the token passed to WaitForNextTickAsync
simply returns ValueTask<bool>(false)
while the former throws an exception which must be caught (and is typically measurably slower, though async tends to smooth these nuances).
This was what I initially set out to do, because one of, if not the top allocation throughout the lifetime of the gateway is DelayPromise
and TimerQueueTimer
(Task.Delay
allocates every time you call it!). For this reason, it's very unideal to use Task.Delay
in a tight/semi-tight loop, especially when its delay is static, as it'll cause lots of avoidable heap allocations.
There's two notable spots where this is an issue.
Inconsistencies on which token is passed
This edges the line between nitpick and major issue, but it is potential for some absolute headaches for anyone who has to refactor/modify the gateway.
As it stands right now, two tokens are held. stopRequested
and _disconnectRequestedSource
; the latter of which is actually a CancellationTokenSource
, but that's somewhat moot.
The issue is that these two are seemingly arbitrarily passed to methods, the latter of which is also cancelled for various reasons, however it's strewn throughout the entire class. The latter of the two is held as a field, and seems to be intended to be passed to external classes (e.g. the payload transport), but then sometimes it's not, and is for some reason passed as a parameter to methods within the gateway, when those methods can access the same field.
For example:
And then just three lines down:
This also technically has the effect of potentially holding the callee captive; while a lot of methods throw when their cancellation token is cancelled, which will bubble up to the callee (RunAsync
), if they don't, the method doesn't return until all the work has been completed.
I suggest using a linked cancellation token and consolidating on the usage of the field vs the parameter.
Nitpickings
recommanded_shards
is not just a recommendation
recommanded_shards
is not just a recommendationMy issue with this section, the client treats recommended_shards > set_shards
as a warning when the proper log level for this situation is LogLevel.Error
(and potentially even returning a preemptive gateway error). The reason being is that if the bot is in > 2500 * shards
guilds, connecting to the gateway will ALWAYS return a 4011 (SHARDING_REQUIRED
) error. See discord's docs
Remora.Discord/Backend/Remora.Discord.Gateway/DiscordGatewayClient.cs
Lines 462 to 473 in 7935da1
Re-calculating heartbeat windows
This is a small issue; very small even, but it's something I picked up on regardless. Despite this only being done once per (re-)connection, I find it strange that this is recalculated. Then again shifting this up to the constructor could be considered bad practice, even if it's technically static-ish data.
Remora.Discord/Backend/Remora.Discord.Gateway/DiscordGatewayClient.cs
Lines 894 to 898 in 7935da1
new
ing timeout policies all over the place
new
ing timeout policies all over the placeI'm not entirely familiar with Polly, so I may be wrong here, but I find it somewhat daft that there are three instances of a timeout being new
'd up, all with the same arbitrary(?) time limit, only to be used once. Are Polly timeout policies not thread-safe? Or I suppose not intended to be reused, otherwise, these should be merged into a field, IMO.
There should be a common DisconnectAsync
method
DisconnectAsync
methodAs it stands right now, there's no semblance of a disconnection method that handles the termination of the gateway (graceful or otherwise). Instead, inconsistent logic is sprinkled across call sites that varies based on context and can cause bugs.
Remora.Discord/Backend/Remora.Discord.Gateway/DiscordGatewayClient.cs
Lines 235 to 250 in 7935da1
Remora.Discord/Backend/Remora.Discord.Gateway/DiscordGatewayClient.cs
Lines 637 to 664 in 7935da1
A good writeup.
For the PeriodicTimer
it would be important to provide a fallback for TFM netstandard2.1
A good writeup.
For the
PeriodicTimer
it would be important to provide a fallback for TFM netstandard2.1
Thanks! I was under the impression that Remora would explicitly drop netstandard support in favor of .NET 6/7, since .NET Core 3.1 becomes EOL this year. If not, this would definitely need to be taken into consideration.
Nice writeup!
I'm pretty sure @carlst99 addressed this issue in the common gateway PR, but I'm not entirely certain.
Yep, I added a 'first priority' send queue, which internal connection management packets were placed on. Anything on this queue was dispatched before any other enqueued packets and if I recall correctly, I ensured all priority packets were dispatched before allowing the send loop to check if it should sleep.
I was also toying around with a few allocation improvements in the transport service. I'd managed to cut back on send allocations significantly, not that those are particularly troublesome, and using the RecyclableMemoryStream
for receiving data was looking promising.
A good writeup.
For thePeriodicTimer
it would be important to provide a fallback for TFM netstandard2.1Thanks! I was under the impression that Remora would explicitly drop netstandard support in favor of .NET 6/7, since .NET Core 3.1 becomes EOL this year. If not, this would definitely need to be taken into consideration.
NetStandard does not EOL with Core 3.1 cause it is still a thing with NET 7.0:
https://learn.microsoft.com/en-us/dotnet/standard/net-standard?tabs=net-standard-2-1#select-net-standard-version
But in the end it still depends if Remora wants to support all netstandard
runtimes even if they are EOL or just those that are not EOL.
NetStandard does not EOL with Core 3.1 cause it is still a thing with NET 7.0:
Yes, netstandard will still live on, but I meant to imply that last I remember (I could be misremembering; I'll look for it on the Discord), Jax wanted to switch away fromnetstandard
all together in favor of .NET 6/7, partially for the benefits of access to more modern APIs. Though, probably best to ask @Nihlus what his goal/decision is in that regard anyway.
I found it. initial question and his initial answer, though everything's kind of up in the air since there hasn't been much further discussion since.
I have no intention of dropping support for .NET Standard - it will remain a runtime-agnostic library target.
#if
blocks it is, then.