buehler/dotnet-operator-sdk

[bug]: Resource watcher does not reconnect

prochnowc opened this issue · 25 comments

Describe the bug

My operator implements 4 CRs.
Everything is working normal after startup and I can see log messages like this for each CRD:

Resource Watcher startup for type "xyz".
Received watch event "xyz" for "kind/name".
The server closed the connection. Trying to reconnect.
Restarting resource watcher for type "xyz".
Received watch event "xyz" for "kind/name".
The server closed the connection. Trying to reconnect.
Restarting resource watcher for type "xyz".
Received watch event "xyz" for "kind/name".

After a few minutes uptime some instances of the ResourceWatcher do not any longer receive the connection closed event.
Therefore the connection is not re-established and my operator does not receive any further events for the CRs.

Restarting the operator fixes the issue .. but after a few minutes the behavior is the same.

To reproduce

I'm currently unable to reproduce this issue ... will try to provide small repro.

Expected behavior

No response

Screenshots

No response

Additional Context

No response

Tried to debug this in a locally running operator ... sadly the issue only happens when the operator runs inside the cluster :(

From what I can see in the apiserver log, all watches terminate and then not all of them are reconnecting.

I0811 14:11:59.364811       1 httplog.go:131] "HTTP" verb="WATCH" URI="/apis/oidc.adesso-financial-platform.com/v2/namespaces/dev/clients?timeoutSeconds=60&watch=true" latency="1m0.001546225s" userAgent="" audit-ID="1fc9444a-d6ae-4ebf-81a2-483fca038b30" srcIP="10.1.4.103:55366" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_init_latency="518.809µs" apf_execution_time="520.109µs" resp=200
I0811 14:11:59.364851       1 httplog.go:131] "HTTP" verb="WATCH" URI="/apis/oidc.adesso-financial-platform.com/v2/namespaces/dev/identityresources?timeoutSeconds=60&watch=true" latency="1m0.00131382s" userAgent="" audit-ID="b7a51574-ba29-4294-8273-7cd7559a309c" srcIP="10.1.4.103:55366" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_init_latency="410.607µs" apf_execution_time="411.808µs" resp=200
I0811 14:11:59.366273       1 httplog.go:131] "HTTP" verb="WATCH" URI="/apis/oidc.adesso-financial-platform.com/v2/namespaces/dev/apiresources?timeoutSeconds=60&watch=true" latency="1m0.000679387s" userAgent="" audit-ID="ed5fdfb4-55f8-4c24-a9dd-2ec5e14c2e34" srcIP="10.1.4.103:55366" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_init_latency="414.807µs" apf_execution_time="416.807µs" resp=200
I0811 14:11:59.366484       1 get.go:260] "Starting watch" path="/apis/oidc.adesso-financial-platform.com/v2/namespaces/dev/clients" resourceVersion="" labels="" fields="" timeout="1m0s"
I0811 14:11:59.367379       1 get.go:260] "Starting watch" path="/apis/oidc.adesso-financial-platform.com/v2/namespaces/dev/apiresources" resourceVersion="" labels="" fields="" timeout="1m0s"
I0811 14:11:59.370016       1 httplog.go:131] "HTTP" verb="WATCH" URI="/apis/oidc.adesso-financial-platform.com/v2/namespaces/dev/apiscopes?timeoutSeconds=60&watch=true" latency="1m0.000844031s" userAgent="" audit-ID="4be19b27-c8d6-42ed-a1ee-e83bde949660" srcIP="10.1.4.103:55366" apf_pl="workload-low" apf_fs="service-accounts" apf_iseats=1 apf_fseats=0 apf_additionalLatency="0s" apf_init_latency="419.907µs" apf_execution_time="421.607µs" resp=200
I0811 14:11:59.371144       1 get.go:260] "Starting watch" path="/apis/oidc.adesso-financial-platform.com/v2/namespaces/dev/apiscopes" resourceVersion="" labels="" fields="" timeout="1m0s"

It seems to me that this is a bug in the KubernetesClient watch logic. Any thoughts ?

I've further debugged the issue by placing additional log messages into KubeOps ...
Added a log message at the bottom of OnWatchEvent in ResourceWatcher<TEntity>:

_logger.LogTrace(
            @"Processed watch event ""{eventType}"" for ""{kind}/{name}"".",
            type,
            resource.Kind,
            resource.Metadata.Name);

The log message shows that for all resource types that stop working this message is never printed - _watchEvents.OnNext() seems to block.

@buehler Implementing the change mentioned in #554 fixes the issue.

I'm a bit curious about the WatcherHttpTimeout default setting of one minute. Is there a reason for this being so low?

Hey @jhested
Yea, the thing is, that kubernetes (the API) will terminate the connection anyway. This timeout is set to 1minute such that the operator can terminate the connection on its own terms. After the timeout is reached, it automatically reconnects. (or should for that matter...) Somehow, longer timeouts crashed anyway at some point.

If I recall correctly, that was the main reason back in the day when I wrote the thing.

@buehler Thank you for your fast reply. The reason I'm asking is because we would like our resource to be reconciled every 20 min. But every time the watcher is recreated, it will trigger the 'Added' event for each resource that it is watching, causing ReconcileAsync() to be called every minute, instead of the desired RequeueAfter delay.

Is the a better approach to handling this in our own code?

@buehler with the mentioned fix it works well even if I use watcher http timeout of 1 hour. In my opinion you dont need to explicitly specify a timeout, the API Server will close the connection anyways.

Hey @jhested
Actually, this is an anti-pattern in Kubernetes. The description of Kubernetes Operators states that they should not have "state" in their reconciliation logic. That was the main reason why the SDK moved from "created", "updated", and "deleted" methods to "reconcile".

In theory, your code should be totally fine with being called each minute. The manifest states the desired state and the operator should take actions to reach that state. So, if another reconcile is called, there should be no actions to take.

I'm just wondering why the ongoing discussion is not about the bug 🤐

@prochnowc you are right :-)

Would you mind opening a PR?

@buehler @prochnowc Im testing #554 suggested change (replacing Switch with Merge) on my prod-like cluster and can validate already that it fixes the timed reconcile issue when status is updated.

This invalidates, or at least provides a working solution, my thoughts explained on my related issue #579

Ill let you know if I observe any side issue

Thank you :-)

I'm currently restructuring this repository, and I have a big refactoring for 8.0 going on. I really cannot get my head around if it should be possible to "re queue" an event since it actually is an anti pattern too.

I dont get your point of anti-pattern related to the "requeue" event thing. its true as you have highlighted above that the Old Created/Updated/etc... events could be considered as anti-pattern somehow. However, in this specific case we are just talking about the ability to schedule reconcile event to execute on an specified interval, and this does not get out of the pattern.

You have stated this before in the doc of the project, as example, ensuring the state of a database could require to execute the reconcile event in a period smaller than the default of 5 minutes.

PD: Even though this is not the place, I want to express my excitement about what you bring us on v8

Going back to the Issue. I conclude the root error is not here.

As mentioned, default reconcile execution occurs every 5 minutes. After applying the fix we are talking about, it is true that reconcile get enqueued within the intended interval, on the opposite, an additional reconcile keeps being fired every 5 minutes.

Therefore, over time, you end up with multiple reconciles getting accumulated, u can see a log trace below (at time 09_22_10_21_35 and 09_22_10_26_42 a new reconcile was trigger).

Now listening on: https://[::]:5004
Application started. Press Ctrl+C to shut down.
Hosting environment: Production
Content root path: /app
09_22_10_16_42 - controller-autoscale: Reconcile Started
09_22_10_16_53 - controller-autoscale: Reconcile Started
09_22_10_17_03 - controller-autoscale: Reconcile Started
09_22_10_17_13 - controller-autoscale: Reconcile Started
09_22_10_17_23 - controller-autoscale: Reconcile Started
09_22_10_17_33 - controller-autoscale: Reconcile Started
09_22_10_17_43 - controller-autoscale: Reconcile Started
09_22_10_17_53 - controller-autoscale: Reconcile Started
09_22_10_18_03 - controller-autoscale: Reconcile Started
09_22_10_18_13 - controller-autoscale: Reconcile Started
09_22_10_18_23 - controller-autoscale: Reconcile Started
09_22_10_18_33 - controller-autoscale: Reconcile Started
09_22_10_18_43 - controller-autoscale: Reconcile Started
09_22_10_18_53 - controller-autoscale: Reconcile Started
09_22_10_19_03 - controller-autoscale: Reconcile Started
09_22_10_19_14 - controller-autoscale: Reconcile Started
09_22_10_19_24 - controller-autoscale: Reconcile Started
09_22_10_19_34 - controller-autoscale: Reconcile Started
09_22_10_19_44 - controller-autoscale: Reconcile Started
09_22_10_19_54 - controller-autoscale: Reconcile Started
09_22_10_20_04 - controller-autoscale: Reconcile Started
09_22_10_20_14 - controller-autoscale: Reconcile Started
09_22_10_20_24 - controller-autoscale: Reconcile Started
09_22_10_20_34 - controller-autoscale: Reconcile Started
09_22_10_20_44 - controller-autoscale: Reconcile Started
09_22_10_20_54 - controller-autoscale: Reconcile Started
09_22_10_21_04 - controller-autoscale: Reconcile Started
09_22_10_21_14 - controller-autoscale: Reconcile Started
09_22_10_21_24 - controller-autoscale: Reconcile Started
## Second reconcile joined the loop
09_22_10_21_35 - controller-autoscale: Reconcile Started
09_22_10_21_42 - controller-autoscale: Reconcile Started
09_22_10_21_45 - controller-autoscale: Reconcile Started
09_22_10_21_52 - controller-autoscale: Reconcile Started
09_22_10_21_55 - controller-autoscale: Reconcile Started
09_22_10_22_02 - controller-autoscale: Reconcile Started
09_22_10_22_05 - controller-autoscale: Reconcile Started
09_22_10_22_12 - controller-autoscale: Reconcile Started
09_22_10_22_15 - controller-autoscale: Reconcile Started
09_22_10_22_22 - controller-autoscale: Reconcile Started
09_22_10_22_25 - controller-autoscale: Reconcile Started
09_22_10_22_32 - controller-autoscale: Reconcile Started
09_22_10_22_35 - controller-autoscale: Reconcile Started
09_22_10_22_42 - controller-autoscale: Reconcile Started
09_22_10_22_45 - controller-autoscale: Reconcile Started
09_22_10_22_53 - controller-autoscale: Reconcile Started
09_22_10_22_55 - controller-autoscale: Reconcile Started
09_22_10_23_03 - controller-autoscale: Reconcile Started
09_22_10_23_05 - controller-autoscale: Reconcile Started
09_22_10_23_13 - controller-autoscale: Reconcile Started
09_22_10_23_15 - controller-autoscale: Reconcile Started
09_22_10_23_23 - controller-autoscale: Reconcile Started
09_22_10_23_25 - controller-autoscale: Reconcile Started
09_22_10_23_33 - controller-autoscale: Reconcile Started
09_22_10_23_35 - controller-autoscale: Reconcile Started
09_22_10_23_43 - controller-autoscale: Reconcile Started
09_22_10_23_45 - controller-autoscale: Reconcile Started
09_22_10_23_53 - controller-autoscale: Reconcile Started
09_22_10_23_55 - controller-autoscale: Reconcile Started
09_22_10_24_03 - controller-autoscale: Reconcile Started
09_22_10_24_06 - controller-autoscale: Reconcile Started
09_22_10_24_13 - controller-autoscale: Reconcile Started
09_22_10_24_16 - controller-autoscale: Reconcile Started
09_22_10_24_23 - controller-autoscale: Reconcile Started
09_22_10_24_26 - controller-autoscale: Reconcile Started
09_22_10_24_33 - controller-autoscale: Reconcile Started
09_22_10_24_36 - controller-autoscale: Reconcile Started
09_22_10_24_43 - controller-autoscale: Reconcile Started
09_22_10_24_46 - controller-autoscale: Reconcile Started
09_22_10_24_53 - controller-autoscale: Reconcile Started
09_22_10_24_56 - controller-autoscale: Reconcile Started
09_22_10_25_03 - controller-autoscale: Reconcile Started
09_22_10_25_06 - controller-autoscale: Reconcile Started
09_22_10_25_13 - controller-autoscale: Reconcile Started
09_22_10_25_16 - controller-autoscale: Reconcile Started
09_22_10_25_24 - controller-autoscale: Reconcile Started
09_22_10_25_26 - controller-autoscale: Reconcile Started
09_22_10_25_34 - controller-autoscale: Reconcile Started
09_22_10_25_36 - controller-autoscale: Reconcile Started
09_22_10_25_44 - controller-autoscale: Reconcile Started
09_22_10_25_46 - controller-autoscale: Reconcile Started
09_22_10_25_54 - controller-autoscale: Reconcile Started
09_22_10_25_56 - controller-autoscale: Reconcile Started
09_22_10_26_04 - controller-autoscale: Reconcile Started
09_22_10_26_06 - controller-autoscale: Reconcile Started
09_22_10_26_14 - controller-autoscale: Reconcile Started
09_22_10_26_16 - controller-autoscale: Reconcile Started
09_22_10_26_24 - controller-autoscale: Reconcile Started
09_22_10_26_27 - controller-autoscale: Reconcile Started
09_22_10_26_34 - controller-autoscale: Reconcile Started
09_22_10_26_37 - controller-autoscale: Reconcile Started
## Third reconcile joined te loop
09_22_10_26_42 - controller-autoscale: Reconcile Started
09_22_10_26_44 - controller-autoscale: Reconcile Started
09_22_10_26_47 - controller-autoscale: Reconcile Started
09_22_10_26_52 - controller-autoscale: Reconcile Started
09_22_10_26_54 - controller-autoscale: Reconcile Started
09_22_10_26_57 - controller-autoscale: Reconcile Started
09_22_10_27_02 - controller-autoscale: Reconcile Started
09_22_10_27_04 - controller-autoscale: Reconcile Started
09_22_10_27_07 - controller-autoscale: Reconcile Started
09_22_10_27_12 - controller-autoscale: Reconcile Started
09_22_10_27_14 - controller-autoscale: Reconcile Started
09_22_10_27_17 - controller-autoscale: Reconcile Started
09_22_10_27_22 - controller-autoscale: Reconcile Started
09_22_10_27_24 - controller-autoscale: Reconcile Started

Hi @buehler, I kept investigating and got down to whats going on here.

As it has been stated, timed requeue reconcile is broken on StatusUpdated. This is happening whenever the Status is updated right before the end of the Reconcile event, as of today probably most of us do, in such case only the last Event, WatcherEvent or LocalEvent, reaching the EventQueue will be kept. In most cases is the ResourceWatcher StatusModified Event reaching the last, and overriding the timed reconcile.

Replacing Switch with Merge provides a solution to this override with the side effect of unlimited accumulated events I mentioned on my previous entry. I want to apologise due to the missleading, there is no default of 5 minutes, what I was reaching was the ResourceWatcher timeout, that I've configured to 5 minutes, whereas the default is 1 minute.

Once the ResourceWatcher is restarted after the timeout it will trigger automatically an Added event. In my opinion this is pointless and undesired for any other case other than on startup, even if after a reconcile you returned null and no requeue, why would you want your resource to be reconciled again after the watcher timeout?

Therefore, since the watcher events are not finally "added" to Observable list in EventQueue until they are mapped at MapToResourceEvent, where we have access to cache. My suggestion is to always drop any watcher event of type added whenever we already have a TEntity cached with the same Uid.

This two changes should help us keep both events, StatusModified and timed reconcile, as well as remove the side effect. Please let me know if you agree with this, and if you want I can submit a PR.

For the time being, introducing a small Sleep after updating status will a better chance to your TimedReconcile to win over StatusModified (of course this is not guarantied)

@Crespalvaro Nice investigation.

Just wanted to note that the correct way ist to compare the resource version of the resources been notifiied.
They could have been changed in between, so simply ignoring the "added" events is not the correct solution.

@prochnowc thanks for your feedback. As I understand the code flow, these Added event should just not occur, they are automatically triggered on watcher recreation, but in fact there's no new added resource.

There is actually no chance for a resource to be newly added with an uid already on the cache since of course k8s wont create 2 different resource entities with same uid.

The only scenario in which we should be worried, is where the resource gets updated meanwhile the watcher is being recreated, i.e. by manual applied changes. In such case we would miss any event the watcher would normally catch. However, this sounds very unlikely to happen.

Furthermore, if this is still a concern, we have access to ResourceCache{TEntity}.CompareCache comparing resource versions and outputs as result CacheComparisonResult.Other whenever there are no changes to the Spec nor Status of the resource. Said so we could go the safe way dropping Added events if uid exist in cache and CacheComparisonResult.Other. I would still vote for the simple check of uid.

This definitely makes sense. If there is a quick-fix I'd like you to make a PR into the release branch. The new main branch is used by v8 until all the refactoring is done.

In v8 I try to make everything simpler (and therefore faster as well) while still keeping the features.

@Crespalvaro The "added" events are sent when the watch is started with the parameter "sendInitalEvents:true". The Kubernetes API docs states that the correct/efficient way to watch for changes is to do a list and then start the watch with the resourceVersion of the list which was returned. This has to be done everytime the watcher has been disconnected. This would make sure that no change is lost. See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes

@prochnowc wow nice contribution, we could actually make use of this as well. Currently, we are just re-opening watcher without resource version, I can give it a try to send cached versions so we wont miss any event as described on the doc you posted.

@buehler Im about to push changes to a fix branch created from "release" branch but Im missing rights, could you fix them out?

Hey @Crespalvaro
You'll need to create a fork, make your changes in a feature branch in your repository and then create a pull-request :-)

There you go @buehler, thanks for the guideline though, first time push to open source to be honest

This should work in v8 now