Controller is not being called after options.WatcherHttpTimeout has elapsed.
Closed this issue · 6 comments
Describe the bug
After the watcher http timeout has elapsed, the watcher is not being recreated. This looks to be related to the fix in #460
a2290ff?diff=split
To Reproduce
I'm not sure if the original issue was reliably reproduced, but I was getting the original Microsoft.Rest.SerializationException
. If you were getting this I expect you should be able to reproduce this issue as well.
Expected behavior
Microsoft.Rest.SerializationException should not break the watcher.
Screenshots
I'd attach logs but I haven't been able to get any logs out of the watcher class. I tried setting Logging__LogLevel__Microsoft=Trace
on the pod.
Additional context
Prior to a2290ff, the SerializationException
would result in a call to restart the watcher WatchResource().ConfigureAwait(false);
. We now catch this exception and do not call restart on the watcher. Eventually we hit our WatcherHttpTimeout and enter the OnClose
method where the watcher is not restarted because the cancellation token was cancelled in the exception handling earlier.
Unsure how we should handle this, calling WatchResource().ConfigureAwait(false);
, we really don't want to be calling WatchResource().ConfigureAwait(false);
every time that SerializationException
is thrown.
We are hitting the exact same problem.
Hello @buehler,
in a2290ff you wanted to make the ResourceWatcher
ignore exceptions due to empty responses (In #460 (comment)). In https://github.com/buehler/dotnet-operator-sdk/blob/master/src/KubeOps/Operator/Kubernetes/ResourceWatcher%7BTEntity%7D.cs#L163 the watcher is disposed of in any case, independently of the type of exception.
In the case of a SerializationException
, the method returns immediately without recreating the watcher - neither directly like in the case of a TaskCancelledException
, nor with an exponential backoff like for all other exceptions.
In order to ensure that the watcher is restarted even after a SerializationException
, I would suggest to either remove the return
statement in https://github.com/buehler/dotnet-operator-sdk/blob/master/src/KubeOps/Operator/Kubernetes/ResourceWatcher%7BTEntity%7D.cs#L183 or to remove the special case for this type of exception completely (if a specific log entry for it is not needed) and handle it together with all other exceptions below.
Do I miss something?
Hi @buehler and everyone!
I had a talk with @anekdoti and I also think that if this exception is transitive then it should be retried with the exponential backoff by removing the aforementioned return statement for the SerializationException
case or remove this case entirely as @anekdoti suggests it, since I surrounded the content of the OnException
method with a try catch block in one of my PRs and it would be logged twice after removing the return statement.
Or do I miss something, too?
thx!
I have an idea @sfinche:
Unsure how we should handle this, calling WatchResource().ConfigureAwait(false);, we really don't want to be calling WatchResource().ConfigureAwait(false); every time that SerializationException is thrown.
We could avoid the recreation of the WatchResource
by moving the treatment of SerializationException
case to the top of the OnException
method before the disposals:
private void OnException(Exception e)
{
switch (e)
{
case SerializationException when
e.InnerException is JsonException &&
e.InnerException.Message.Contains("The input does not contain any JSON tokens"):
_logger.LogDebug(
@"The watcher received an empty response for resource ""{resource}"".",
typeof(TEntity));
return;
}
_cancellation?.Cancel();
_watcher?.Dispose();
_watcher = null;
//....
}
The return statement wouldn't be removed in this way. What do you think about such a fix @sfinche @buehler @anekdoti @alexmg, please? Do you see any problems with it. thx!
I think your latest proposal should be fine, @tomasfabian . Thank you for your thoughts! What do you think @buehler ?