buehler/dotnet-operator-sdk

Controller is not being called after options.WatcherHttpTimeout has elapsed.

Closed this issue · 6 comments

Describe the bug
After the watcher http timeout has elapsed, the watcher is not being recreated. This looks to be related to the fix in #460
a2290ff?diff=split

To Reproduce
I'm not sure if the original issue was reliably reproduced, but I was getting the original Microsoft.Rest.SerializationException. If you were getting this I expect you should be able to reproduce this issue as well.

Expected behavior
Microsoft.Rest.SerializationException should not break the watcher.

Screenshots
I'd attach logs but I haven't been able to get any logs out of the watcher class. I tried setting Logging__LogLevel__Microsoft=Trace on the pod.

Additional context
Prior to a2290ff, the SerializationException would result in a call to restart the watcher WatchResource().ConfigureAwait(false);. We now catch this exception and do not call restart on the watcher. Eventually we hit our WatcherHttpTimeout and enter the OnClose method where the watcher is not restarted because the cancellation token was cancelled in the exception handling earlier.

Unsure how we should handle this, calling WatchResource().ConfigureAwait(false);, we really don't want to be calling WatchResource().ConfigureAwait(false); every time that SerializationException is thrown.

We are hitting the exact same problem.

Hello @buehler,

in a2290ff you wanted to make the ResourceWatcher ignore exceptions due to empty responses (In #460 (comment)). In https://github.com/buehler/dotnet-operator-sdk/blob/master/src/KubeOps/Operator/Kubernetes/ResourceWatcher%7BTEntity%7D.cs#L163 the watcher is disposed of in any case, independently of the type of exception.

In the case of a SerializationException, the method returns immediately without recreating the watcher - neither directly like in the case of a TaskCancelledException, nor with an exponential backoff like for all other exceptions.

In order to ensure that the watcher is restarted even after a SerializationException, I would suggest to either remove the return statement in https://github.com/buehler/dotnet-operator-sdk/blob/master/src/KubeOps/Operator/Kubernetes/ResourceWatcher%7BTEntity%7D.cs#L183 or to remove the special case for this type of exception completely (if a specific log entry for it is not needed) and handle it together with all other exceptions below.

Do I miss something?

Hi @buehler and everyone!
I had a talk with @anekdoti and I also think that if this exception is transitive then it should be retried with the exponential backoff by removing the aforementioned return statement for the SerializationException case or remove this case entirely as @anekdoti suggests it, since I surrounded the content of the OnException method with a try catch block in one of my PRs and it would be logged twice after removing the return statement.

Or do I miss something, too?

thx!

I have an idea @sfinche:

Unsure how we should handle this, calling WatchResource().ConfigureAwait(false);, we really don't want to be calling WatchResource().ConfigureAwait(false); every time that SerializationException is thrown.

We could avoid the recreation of the WatchResource by moving the treatment of SerializationException case to the top of the OnException method before the disposals:

    private void OnException(Exception e)
    {
            switch (e)
            {
                case SerializationException when
                    e.InnerException is JsonException &&
                    e.InnerException.Message.Contains("The input does not contain any JSON tokens"):
                    _logger.LogDebug(
                        @"The watcher received an empty response for resource ""{resource}"".",
                        typeof(TEntity));
                    return;
            }

            _cancellation?.Cancel();
            _watcher?.Dispose();
            _watcher = null;
         //....
}

The return statement wouldn't be removed in this way. What do you think about such a fix @sfinche @buehler @anekdoti @alexmg, please? Do you see any problems with it. thx!

I think your latest proposal should be fine, @tomasfabian . Thank you for your thoughts! What do you think @buehler ?

Agree with @anekdoti , the fix looks sensible to me. I might also look into where the original Microsoft.Rest.SerializationException is actually coming from at some point too.