kubewharf/kubeadmiral

`readyz` should reflect `InformerManager` and `FederatedInformerManager`'s cache sync status

gary-lgy opened this issue · 3 comments

InformerManager and FederatedInformerManager manage informers for the host and member clusters, respectively, based on FederatedTypeConfigs. For each FederatedTypeConfig, the managers start an informer for the corresponding type.

If the informer for a type cannot be started or its cache cannot be synced (be it for host cluster or member clusters), we should expose this information for visibility. The current mechanism for doing so is by including the information in the response returned by the controller manager's readyz endpoint. Refer to existing controller's readyz implementation for how one can implement cache sync check for InformerManager and FederaterdInformerManager.

Hi @gary-lgy Can I have a chance to try this issue?

I found InformerManager and FederatedInformerManager are created in createControllerContext, which means the initialization logic is different from the existing controllers (created and started in core.go), so wondering if we can just directly add readyz checkers like:

healthCheckHandler.AddReadyzChecker("informer-manager", func(_ *http.Request) error {
	if controllerCtx.InformerManager.HasSynced() {
		return nil
	}
	return fmt.Errorf("informer-manager not ready")
})
healthCheckHandler.AddReadyzChecker("federated-informer-manager", func(_ *http.Request) error {
	if controllerCtx.FederatedInformerManager.HasSynced() {
		return nil
	}
	return fmt.Errorf("federated-informer-manager not ready")
})

Hi @z1cheng, please feel free to contribute :)

We could add the checkers in createControllerContext as you suggested, however the informer managers' HasSynced method only exposes whether the FederatedTypeConfig and/or FederatedCluster caches are synced. This is important, certainly, but we're also interested in the status of the informers managed by the informer managers. Sometimes the reason a resource cannot be successfully dispatched is because the informer for that type is not synced. We want to expose this information for better visibility. The desired readyz output may look something like this:

% curl "localhost:11257/readyz?verbose"
[+]controller-automigration ok
[+]controller-federate ok
...
[+]informer-deployment.apps ok
[+]informer-secrets ok
...
[+]federatedinformer-member1-secrets ok
[+]federatedinformer-member1-deployments.apps ok
[+]federatedinformer-member2-secrets ok
[+]federatedinformer-member2-deployments.apps ok

Would you like to implement this?

Sure, it's more interesting than I thought!

/assign