Lombiq/Hosting-Tenants

IdleShutdownTask Feedback (OSOE-625)

Closed this issue · 6 comments

jtkech commented

Because I had a chat with @Psichorex about the IdleShutdownTask I took a look at the code

private static Task InvokeShutdownAsync(ShellSettings shellSettings, IShellHost shellHost) =>
shellHost.ReleaseShellContextAsync(shellSettings);

When we release a tenant on inactivity we don't want to signal it to other instances in a distributed env (e.g. if OC.Tenants.Distributed and OC.RedisCache are enabled), this because we have released the tenant on inactivity, not because it has changed. To do so you need to set to false the eventSource parameter.

 _shellHost.ReleaseShellContextAsync(_shellSettings, eventSource: false);

private static async Task InvokeRestartAsync(
IShellSettingsManager shellSettingsManager,
IShellHost shellHost,
ShellSettings shellSettings)
{
await shellSettingsManager.SaveSettingsAsync(shellSettings);
await shellHost.UpdateShellSettingsAsync(shellSettings);
}

No need to call SaveSettingsAsync() as it is done by UpdateShellSettingsAsync(), then as I understood we could have just call ReloadShellContextAsync(), but normally I think we don't need this (see below).


try
{
await InvokeShutdownAsync(shellSettings, shellHost);
}
catch (Exception e)
{
logger?.LogError(
e,
"Shutting down \"{ShellName}\" because of idle timeout failed with the following exception. Another shutdown will be attempted.",
shellSettings?.Name);
// If the ReleaseShellContextAsync() fails (which can happen with a DB error: then the transaction
// commits triggered by the dispose will fail) then while the tenant is unavailable the shell is still
// active in a failed state. So first we need to correctly start the tenant, then shut it down for good.
var shellSettingsManager = serviceProvider.GetService<IShellSettingsManager>();
await InvokeRestartAsync(shellSettingsManager, shellHost, shellSettings);
await InvokeShutdownAsync(shellSettings, shellHost);
}

ReleaseShellContextAsync() just releases the shell and remove the context and settings from the related shell host dictionaries, this without any database accesses, so normally it should not fail.

So normally all the above code could be replaced by only one line.

 return _shellHost.ReleaseShellContextAsync(_shellSettings, eventSource: false);

Jira issue

Thank you for these tips, JT! I think this might actually be a memory leak we're seeing.

Noted your comments and I soon will be addressing this aswell.

jtkech commented

I think this might actually be a memory leak we're seeing.

Yes this is what we discussed on Skipe, but I don't see how releasing shells could be a source of leaks (knowing that the GC doesn't behave the same in dev/debug mode), unless if something else was wrong, for example releasing all shells by caling ReleaseAllShellContextsAsync() whose result would be that any tenant would have to be rebuilt on any subsequent request.

@jtkech Hi JT! Good to have you here aswell :)
This is my fully reworked code atm:
image

I added the eventsource: false parameter and I reworked the whole of it pretty much.
We are going to do a deploy with branch NEST-361-dev-repro and see if Azure is still facing memory leaks.

I have actually opted for using the serviceProvider:
image

Addressed in #50.