tkestack/galaxy

galaxy not work as expected

Closed this issue · 1 comments

When I test the networkpolicy function with galaxy,I encountered some problem. Have I used the wrong image or wrong way to test it. Pls do me a favor

Software version:
galaxy 1.0.2
k8s v1.14.6
docker 18.09.9

Actions:
[root@9 /install/policy]# kubectl -n mesh-demo get pods
NAME READY STATUS RESTARTS AGE
consumer-1-1-0-787888cb94-gzxvn 1/1 Running 0 14h
provider-1-1-0-bbbb9cd65-95wxc 1/1 Running 0 14s

[root@9 /install/policy]# cat provider_ingress.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: provider-network-policy
namespace: mesh-demo
spec:
podSelector:
matchLabels:
tcnp-service-runtime: provider
tcnp-service-version: 1.1.0
policyTypes:

  • Ingress
    ingress:
  • from:
    • namespaceSelector:
      matchLabels:
      a: b
      [root@9 /install/policy]# kubectl -n mesh-demo apply -f provider_ingress.yaml
      networkpolicy.networking.k8s.io/provider-network-policy created

Problem description:
When I use components described above to launch a tke cluster for business-purpose,according to the instroductions(https://github.com/tkestack/galaxy/blob/master/doc/network-policy.md).Some problem just happen.

1 The galaxy image does not have ipset and its dependencies installed. When open the networkpolicy switch in launching command, it printed out the following error:

running command:
[root@9 ~]# kubectl -n kube-system exec -it galaxy-daemonset-
galaxy-daemonset-5qxps galaxy-daemonset-rbxll
[root@9 ~]# kubectl -n kube-system exec galaxy-daemonset-5qxps -- ps -ef|grep galaxy
root 6840 3279 0 03:43 ? 00:00:00 kubectl -n kube-system exec galaxy-daemonset-5qxps -- ps -ef
root 6841 3279 0 03:43 ? 00:00:00 grep --color=auto galaxy
root 25740 25709 0 Mar01 ? 00:00:00 /bin/sh -c cp -p /etc/cni/net.d/00-galaxy.conf /host/etc/cni/net.d/; cp -p /opt/cni/bin/* /host/opt/cni/bin/; /usr/bin/galaxy --network-policy --logtostderr=true --v=5
root 25836 25740 0 Mar01 ? 00:02:39 /usr/bin/galaxy --network-policy --logtostderr=true --v=5

Err info:
02 03:51:30.927516 2493 policy.go:1035] failed to add entry 172.20.1.218 to ipset GLX-ip-24U5H3NXPZITELP7: error adding entry 172.20.1.218, error: executable file not found in $PATH

2、When opening networkpolicy switch,galaxy pod watched k8s networkpolicies and fetched relative business pods which matching label selectors described in networkpolicy manifests and hostName of the K8S node which the galaxy pod located on, for matching purpose。Which way is supposed to filter the pods by the HostName。Below is the codehttps://github.com/tkestack/galaxy/blob/v1.0.2/pkg/policy/policy.go):
`func (p *PolicyManager) syncPods() {
...
if p.podCachedInformer.HasSynced() {
...
nodeHostName := k8s.GetHostname()
glog.V(4).Infof("find %d pods, nodeHostName %s", len(pods), nodeHostName)
for i := range pods {
if pods[i].Spec.NodeName != nodeHostName {
continue
}
wg.Add(1)
glog.V(4).Infof("starting goroutine to sync pod chain for %s_%s", pods[i].Name, pods[i].Namespace)
go syncPodChains(pods[i])
}
} else {
...
}

func GetHostname() string {
hostname := *flagHostnameOverride
if hostname == "" {
hostname = os.Getenv("MY_NODE_NAME")
if hostname == "" {
nodename, err := os.Hostname()
if err != nil {
glog.Fatalf("Couldn't determine hostname: %v", err)
}
hostname = nodename
}
}
return strings.ToLower(strings.TrimSpace(hostname))
}`

In my environment,pods[i].Spec.NodeName是IP like:
root@9 /install/policy]# kubectl -n mesh-demo get pods provider-1-1-0-bbbb9cd65-95wxc -o yaml |grep -i nodename nodeName: 1.2.3.4

but k8s.GetHostname() is: vm_host_4
then the iptables chain like GLX-INGRESS/GLX-EGRESS/GLX-POD-XXXX will not be create because of pods[i].Spec.NodeName != nodeHostName

3 、when I used workaround method to ensure “pods[i].Spec.NodeName == nodeHostName”,then the iptables chains and rules will be properly set . The networkpolicy did take effect. But when I delete the business pod or networkpolicy resource, it seemed like the iptables chains and iptable rules is still there,didn't be sync to delete

Thanks for reporting. Issue 2 is fixed by #30
I'll fix issue 1 and update the image.