litmuschaos/litmus-go

Openshift OVN annotations propagation - container-kill experiement

ludovic-pourrat opened this issue · 3 comments

What happened:

When running the container-kill experiment in Openshit with OVN as our CNI, the helper pod is created with OVN specific annotations that are propagated from the originating Litmus POD. This result in having the helper POD not capable to reach the API server on Kubernetes.

We will be testing soon the approach to filter the OVN specific annotations. See observations below.

What you expected to happen:

We expected this experiment to work,

How to reproduce it (as minimally and precisely as possible):

We have captured the YAML of the helper POD, and by removing the OVN specific annotations it worked.

Anything else we need to know?:

OVN Annotations :

  • k8s.ovn.org/pod-networks
  • k8s.v1.cni.cncf.io/network-status[]

We have identified that in the pod.go the SetHelperData is propagating the annotations, see line about chaosDetails.Annotations = pod.Annotations

func SetHelperData(chaosDetails *types.ChaosDetails, setHelperData string, clients clients.ClientSets) error {
	var pod *core_v1.Pod
	pod, err = clients.KubeClient.CoreV1().Pods(chaosDetails.ChaosNamespace).Get(context.Background(), chaosDetails.ChaosPodName, v1.GetOptions{})
	if err != nil {
		return err
	}

	// Get Labels
	labels := pod.ObjectMeta.Labels
	delete(labels, "controller-uid")
	delete(labels, "job-name")
	chaosDetails.Labels = labels

	switch setHelperData {
	case "false":
		return nil

	default:

		// Get Chaos Pod Annotation
		chaosDetails.Annotations = pod.Annotations
               // OVN annotations filtering should be processed here.

		// Get ImagePullSecrets
		chaosDetails.ImagePullSecrets = pod.Spec.ImagePullSecrets

		// Get Resource Requirements
		chaosDetails.Resources, err = getChaosPodResourceRequirements(chaosDetails.ChaosPodName, chaosDetails.ExperimentName, chaosDetails.ChaosNamespace, clients)
		if err != nil {
			return errors.Errorf("unable to get resource requirements, err: %v", err)
		}
		return nil
	}
}

@ludovic-pourrat thanks for reporting this. If these are annotations that are being propagated to helper from the experiment pod and you'd like to prevent that, you can set SET_HELPER_DATA env to false in the ChaosEngine's experiment spec (it is set to true by default).

Ref:

experimentDetails.SetHelperData = types.Getenv("SET_HELPER_DATA", "true")

Having said that, how are these annotations getting attached to the experiment pods / are you specifically adding them or does it get auto-added by OpenShift (as the experiment pod also has a need to talk to kube API server). Could you please share the version of OpenShift, OVN /other config info being used here that we can use for validation as well?

Hi,

Thanks for your answer ! We will deactivate the helper data as recommended.

I can confirm that those annotations are auto-added to the experiment POD at runtime by OVN.

Here is the version of OCP we are currently using - 4.10.16, OVN being the one attached to the release.

The experiment is reported to work when setting the SET_HELPER_DATA to false.

Thanks !