openshift/ingress-node-firewall

Fix race condition in IngressNodeFirewallController

andreaskaris opened this issue · 2 comments

This happens every few times when running the unit tests. The unit tests create and delete resources in short intervals one after another, so this is definitely a race condition which will need to be ironed out. Below, we have a leftover nodestate resource from a previous test which somehow isn't cleaned up correctly:

2022/08/10 15:02:26 reconciling (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 does not exist, creating (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 successfully created (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 reconciling (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 update was successful
2022/08/10 15:02:26 reconciling (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 update was successful
2022/08/10 15:02:26 reconciling (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 update was successful
2022/08/10 15:02:26 reconciling (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 update was successful
2022/08/10 15:02:26 reconciling (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 update was successful
2022/08/10 15:02:26 reconciling (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
2022/08/10 15:02:26 update was successful
2022/08/10 15:02:26 reconciling (apps/v1, Kind=DaemonSet) ingress-node-fw-config-test-namespace/ingress-node-firewall-daemon
•••••••••
------------------------------
• Failure [1.084 seconds]
IngressNodeFirewall controller rules
/home/akaris/development/ingress-node-firewall/controllers/ingressnodefirewall_controller_rules_test.go:20
  when IngressNodeFirewall objects are created for test case: "baseline test without merging"
  /home/akaris/development/ingress-node-firewall/controllers/ingressnodefirewall_controller_rules_test.go:780
    The resulting IngressNodeFirewallNodeState object should look as expected [It]
    /home/akaris/development/ingress-node-firewall/controllers/ingressnodefirewall_controller_rules_test.go:781

    Timed out after 1.003s.
    Expected
        <bool>: false
    to be true

    /home/akaris/development/ingress-node-firewall/controllers/ingressnodefirewall_controller_test.go:343
------------------------------
STEP: Creating node worker-0
STEP: Creating new IngressNodeFirewall objects
By creating new IngressNodeFirewall object firewall-01.6601365479200597e+09	INFO	controllers.IngressNodeFirewall	Getting all IngressNodeFirewallNodeState objects in namespace	{"req.Name": "firewall-0", "r.Namespace": "ingress-node-fw-config-test-namespace"}
STEP: Waiting for the expected list of IngressNodeFirewallNodeStates
1.6601365479258146e+09	INFO	controllers.IngressNodeFirewall	Getting all IngressNodeFirewall objects	{"req.Name": "firewall-0"}
Could not find the desired number of IngressNodeFirewallNodeStates. Found 0 objects but expected to find 1 objects. Object list: []
1.660136547931877e+09	INFO	controllers.IngressNodeFirewall	Building the desired node state specs	{"req.Name": "firewall-0"}
1.6601365479352949e+09	INFO	controllers.IngressNodeFirewall	Object node found, triggering creation	{"req.Name": "firewall-0", "nodeToCreate": "worker-daemon"}
Could not find the desired number of IngressNodeFirewallNodeStates. Found 0 objects but expected to find 1 objects. Object list: []
1.660136547943595e+09	INFO	controllers.IngressNodeFirewall	Reconciling resource and programming bpf	{"name": "worker-daemon", "namespace": "ingress-node-fw-config-test-namespace"}
1.6601365479459715e+09	INFO	controllers.IngressNodeFirewall	Created object	{"req.Name": "firewall-0", "ingressNodeFirewallNodeState.Namespace": "ingress-node-fw-config-test-namespace", "ingressNodeFirewallNodeState.Name": "worker-daemon"}
1.6601365479459975e+09	INFO	controllers.IngressNodeFirewall	Object node found, triggering creation	{"req.Name": "firewall-0", "nodeToCreate": "worker-0"}
1.6601365479492908e+09	INFO	controllers.IngressNodeFirewall	Reconciling resource and programming bpf	{"name": "worker-daemon", "namespace": "ingress-node-fw-config-test-namespace"}
Could not find the desired number of IngressNodeFirewallNodeStates. Found 2 objects but expected to find 1 objects. Object list: [{{IngressNodeFirewallNodeState ingressnodefirewall.openshift.io/v1alpha1} {worker-0  ingress-node-fw-config-test-namespace  c2ec61c8-52d0-4a4a-8a1c-f18fd00a90f8 342 1 2022-08-10 15:02:27 +0200 CEST <nil> <nil> map[] map[] [{ingressnodefirewall.openshift.io/v1alpha1 IngressNodeFirewall firewall-0 c2ca64b8-87ae-423e-8e86-b36ab427a70f <nil> <nil>}] []  [{controllers.test Update ingressnodefirewall.openshift.io/v1alpha1 2022-08-10 15:02:27 +0200 CEST FieldsV1 {"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"c2ca64b8-87ae-423e-8e86-b36ab427a70f\"}":{}}},"f:spec":{".":{},"f:interfaceIngressRules":{".":{},"f:eth0":{}}}} }]} {map[eth0:[{[10.0.0.0] [{10 {TCP 0xc00056c660 <nil> <nil> <nil> <nil>} Allow}]}]]} { }} {{IngressNodeFirewallNodeState ingressnodefirewall.openshift.io/v1alpha1} {worker-daemon  ingress-node-fw-config-test-namespace  aa1a09cc-f4a8-4a1e-a14e-a777a6a7c22c 341 1 2022-08-10 15:02:27 +0200 CEST <nil> <nil> map[] map[] [{ingressnodefirewall.openshift.io/v1alpha1 IngressNodeFirewall firewall-0 c2ca64b8-87ae-423e-8e86-b36ab427a70f <nil> <nil>}] []  [{controllers.test Update ingressnodefirewall.openshift.io/v1alpha1 2022-08-10 15:02:27 +0200 CEST FieldsV1 {"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"c2ca64b8-87ae-423e-8e86-b36ab427a70f\"}":{}}},"f:spec":{".":{},"f:interfaceIngressRules":{".":{},"f:eth0":{}}}} } {controllers.test Update ingressnodefirewall.openshift.io/v1alpha1 2022-08-10 15:02:27 +0200 CEST FieldsV1 {"f:status":{".":{},"f:syncStatus":{}}} status}]} {map[eth0:[{[10.0.0.0] [{10 {TCP 0xc00056c680 <nil> <nil> <nil> <nil>} Allow}]}]]} {Synchronized }}]
1.6601365479552739e+09	INFO	controllers.IngressNodeFirewall	Created object	{"req.Name": "firewall-0", "ingressNodeFirewallNodeState.Namespace": "ingress-node-fw-config-test-namespace", "ingressNodeFirewallNodeState.Name": "worker-0"}
1.6601365479553275e+09	INFO	controllers.IngressNodeFirewall	Getting all IngressNodeFirewallNodeState objects in namespace	{"req.Name": "firewall-0", "r.Namespace": "ingress-node-fw-config-test-namespace"}
1.6601365479579825e+09	INFO	controllers.IngressNodeFirewall	Getting all IngressNodeFirewall objects	{"req.Name": "firewall-0"}
1.6601365479596775e+09	INFO	controllers.IngressNodeFirewall	Building the desired node state specs	{"req.Name": "firewall-0"}
1.6601365479612281e+09	INFO	controllers.IngressNodeFirewall	Getting all IngressNodeFirewallNodeState objects in namespace	{"req.Name": "firewall-0", "r.Namespace": "ingress-node-fw-config-test-namespace"}
1.6601365479645565e+09	INFO	controllers.IngressNodeFirewall	Getting all IngressNodeFirewall objects	{"req.Name": "firewall-0"}
Could not find the desired number of IngressNodeFirewallNodeStates. Found 2 objects but expected to find 1 objects. Object list: [{{IngressNodeFirewallNodeState ingressnodefirewall.openshift.io/v1alpha1} {worker-0  ingress-node-fw-config-test-namespace  c2ec61c8-52d0-4a4a-8a1c-f18fd00a90f8 343 1 2022-08-10 15:02:27 +0200 CEST <nil> <nil> map[] map[] [{ingressnodefirewall.openshift.io/v1alpha1 IngressNodeFirewall firewall-0 c2ca64b8-87ae-423e-8e86-b36ab427a70f <nil> <nil>}] []  [{controllers.test Update ingressnodefirewall.openshift.io/v1alpha1 2022-08-10 15:02:27 +0200 CEST FieldsV1 {"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"c2ca64b8-87ae-423e-8e86-b36ab427a70f\"}":{}}},"f:spec":{".":{},"f:interfaceIngressRules":{".":{},"f:eth0":{}}}} } {controllers.test Update ingressnodefirewall.openshift.io/v1alpha1 2022-08-10 15:02:27 +0200 CEST FieldsV1 {"f:status":{".":{},"f:syncStatus":{}}} status}]} {map[eth0:[{[10.0.0.0] [{10 {TCP 0xc00056c960 <nil> <nil> <nil> <nil>} Allow}]}]]} {Synchronized }} {{IngressNodeFirewallNodeState ingressnodefirewall.openshift.io/v1alpha1} {worker-daemon  ingress-node-fw-config-test-namespace  aa1a09cc-f4a8-4a1e-a14e-a777a6a7c22c 341 1 2022-08-10 15:02:27 +0200 CEST <nil> <nil> map[] map[] [{ingressnodefirewall.openshift.io/v1alpha1 IngressNodeFirewall firewall-0 c2ca64b8-87ae-423e-8e86-b36ab427a70f <nil> <nil>}] []  [{controllers.test Update ingressnodefirewall.openshift.io/v1alpha1 2022-08-10 15:02:27 +0200 CEST FieldsV1 {"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"c2ca64b8-87ae-423e-8e86-b36ab427a70f\"}":{}}},"f:spec":{".":{},"f:interfaceIngressRules":{".":{},"f:eth0":{}}}} } {controllers.test Update ingressnodefirewall.openshift.io/v1alpha1 2022-08-10 15:02:27 +0200 CEST FieldsV1 {"f:status":{".":{},"f:syncStatus":{}}} status}]} {map[eth0:[{[10.0.0.0] [{10 {TCP 0xc00056c980 <nil> <nil> <nil> <nil>} Allow}]}]]} {Synchronized }}]
(...)
1.6601365489525127e+09	INFO	controllers.IngressNodeFirewall	Getting all IngressNodeFirewall objects	{"req.Name": "firewall-0"}
1.6601365489577942e+09	INFO	controllers.IngressNodeFirewall	Building the desired node state specs	{"req.Name": "firewall-0"}
1.6601365489578543e+09	INFO	controllers.IngressNodeFirewall	Existing object not found in desired list, triggering deletion	{"req.Name": "firewall-0"}
1.6601365489717815e+09	INFO	controllers.IngressNodeFirewall	Existing object not found in desired list, triggering deletion	{"req.Name": "firewall-0"}
1.6601365489837143e+09	INFO	controllers.IngressNodeFirewall	Getting all IngressNodeFirewallNodeState objects in namespace	{"req.Name": "firewall-0", "r.Namespace": "ingress-node-fw-config-test-namespace"}
••••••••••2022/08/10 15:02:49 failed to shut down testEnv err timeout waiting for process kube-apiserver to stop



Summarizing 1 Failure:

[Fail] IngressNodeFirewall controller rules when IngressNodeFirewall objects are created for test case: "baseline test without merging" [It] The resulting IngressNodeFirewallNodeState object should look as expected 
/home/akaris/development/ingress-node-firewall/controllers/ingressnodefirewall_controller_test.go:343

Ran 20 of 20 Specs in 29.974 seconds
FAIL! -- 19 Passed | 1 Failed | 0 Pending | 0 Skipped

we aren't down yet with race issues

ut looks better now at least been consistent with CI run so I will close this for now and if we hit another race we can reopen or create new issue