Specify the assumption on other controllers

Question

Specify the assumption on other controllers

marshtompsxd opened this issue a year ago · 1 comments

We've been discussing how to encode our assumptions on other controllers that run in the same cluster for a while. Informally, we want to assume that other controllers will create/update/delete any object except those our controller would create/update/delete. To formalize this, we will need to define a set of the objects our controller would create/update/delete. In Kubernetes, each object is uniquely identified by a tuple <kind, name, namespace>, which is also referred to as the key of each object (since all objects are stored in a key-value store). So we need to define a set of object keys for our controller.

To define this set, we need to formalize a relation between a custom resource object (that would trigger the controller's reconciliation function) and the object that the controller would create/delete/update (typically an object owned by the custom resource object, like a config map or a stateful set object). The reason is that controllers usually decide the name/namespace of the objects based on the name/namespace of the custom resource object. For example, suppose there is a simple controller that creates a config map for each custom resource object. If we have a custom resource object named foo in the default namespace, the controller will create a config map called foo (the same name) in the same default namespace. Such a relation can be formalized as:

open spec fn managed_by(cr_key, key) -> bool { // here we assume cr_key.kind.is_CustomResource()
  &&& key.kind.is_Configmap() // it has to be a config map object...
  &&& cr_key.name == key.name // the names are the same...
  &&& cr_key.namespace == key.namespace // and so are the namespaces
}

Note that if the controller would create more objects then we will have a disjunction of all such objects.

The tricky part is that the set derived from this relation is too large --- our set for this simple controller will naively contain all the config map objects (with whatever name/namespace) since there is no constraint on the name/namespace of the custom resource object. If we use such a set in our state machine specification, then we end up assuming other controllers never create any config map object, which is too strong an assumption.

Note that in reality, naming conflicts could happen between controllers, and Kubernetes does not have any panacea for this --- it has this uid mechanism but uid does not appear in the key and is not used for index objects. It is usually not the job of the controller to avoid naming conflicts from other controllers --- a controller does not even necessarily know the existence of other controllers. Each controller should name the objects in a descriptive way for clarity, and different users that share the same Kubernetes cluster should coordinate with each other to avoid potential naming conflicts, e.g., by working in different namespaces or using different prefixes/suffixes to name their custom resource objects.

Based on all the reasoning above, we plan to add a constraint (temporarily named allowed()) on the name/namespace of the custom resource object, and the set of objects created/updated/deleted by the controller will be specified as a predicate on the object key:

open spec fn domestic(key) -> bool { // for any object key, it belongs to our set if...
  exists |cr_key| { // there exists a custom resource object whose key...
    &&& cr_key.kind.is_CustomResource() // says it is indeed a custom resource object...
    &&& allowed(cr_key.name, cr_key.namespace) // and whose name/namespace satisfies our constraint...
    &&& managed_by(cr_key, key) // and the relation says yes
  }
}

In our top-level state machine specification, the other controllers can create/update/delete any object whose key is not in this set (i.e., !domestic(key)). Still, use the simple controller above as an example, this set with constraint allows other controllers to create config map objects, as long as there is no naming conflict (ensured by allowed).

Note that although managed_by is specific to the controller and should be provided by the controller developer, the developer doesn't need to provide allowed --- we don't care about what this constraint would be, and we only want to say that once the constraint is fixed, the set of objects is also fixed and no other controllers will mess up with the objects concerned by our controller. Thus, allowed can stay as a closed spec, which means the proof works for whatever constraints people might use in reality to avoid naming conflicts.

Accordingly, we will rephrase our liveness property to

forall |cr: CustomResource|
  allowed(cr.name, cr.namespace) -> ([]desired_state_is(cr) ~> []current_state_matches(cr))

which means we only care about the custom resource object that satisfies the naming constraint --- our controller cannot provide any guarantee for the others that would conflict with other controllers.

Answer 1 · 2023-09-25T21:13:59.000Z

Conclusion: we will not specify the assumptions of other controllers, because a pen-and-paper proof can show that a state machine with the assumptions on other controllers described above is the same as the state machine we have now in terms of reasoning about the controller's correctness.