tkestack/galaxy

quick start failed

currycan opened this issue · 17 comments

k8s version:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:00:47Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

galaxy version: v1.0.7
flannel version: v0.13.0

following the guide, just change the command to private-cloud .
The logs as follow:

I0113 17:00:58.195570   11205 flags.go:52] FLAG: --add-dir-header="false"
I0113 17:00:58.195915   11205 flags.go:52] FLAG: --alsologtostderr="false"
I0113 17:00:58.195923   11205 flags.go:52] FLAG: --bridge-nf-call-iptables="true"
I0113 17:00:58.195932   11205 flags.go:52] FLAG: --cni-paths="[/opt/cni/galaxy/bin]"
I0113 17:00:58.195943   11205 flags.go:52] FLAG: --flannel-allocated-ip-dir="/var/lib/cni/networks,/var/lib/cni/networks/galaxy-flannel"
I0113 17:00:58.195950   11205 flags.go:52] FLAG: --flannel-gc-interval="10s"
I0113 17:00:58.195955   11205 flags.go:52] FLAG: --gc-dirs="/var/lib/cni/flannel,/var/lib/cni/galaxy,/var/lib/cni/galaxy/port"
I0113 17:00:58.195962   11205 flags.go:52] FLAG: --hostname-override=""
I0113 17:00:58.195966   11205 flags.go:52] FLAG: --ip-forward="true"
I0113 17:00:58.195971   11205 flags.go:52] FLAG: --json-config-path="/etc/galaxy/galaxy.json"
I0113 17:00:58.195978   11205 flags.go:52] FLAG: --kubeconfig=""
I0113 17:00:58.195983   11205 flags.go:52] FLAG: --log-backtrace-at=":0"
I0113 17:00:58.195991   11205 flags.go:52] FLAG: --log-dir=""
I0113 17:00:58.195996   11205 flags.go:52] FLAG: --log-file=""
I0113 17:00:58.196001   11205 flags.go:52] FLAG: --log-file-max-size="1800"
I0113 17:00:58.196006   11205 flags.go:52] FLAG: --log-flush-frequency="5s"
I0113 17:00:58.196030   11205 flags.go:52] FLAG: --logtostderr="true"
I0113 17:00:58.196044   11205 flags.go:52] FLAG: --master=""
I0113 17:00:58.196051   11205 flags.go:52] FLAG: --network-conf-dir="/etc/cni/net.d/"
I0113 17:00:58.196056   11205 flags.go:52] FLAG: --network-policy="false"
I0113 17:00:58.196061   11205 flags.go:52] FLAG: --pprof="false"
I0113 17:00:58.196066   11205 flags.go:52] FLAG: --route-eni="false"
I0113 17:00:58.196071   11205 flags.go:52] FLAG: --skip-headers="false"
I0113 17:00:58.196076   11205 flags.go:52] FLAG: --skip-log-headers="false"
I0113 17:00:58.196081   11205 flags.go:52] FLAG: --stderrthreshold="2"
I0113 17:00:58.196085   11205 flags.go:52] FLAG: --v="3"
I0113 17:00:58.196090   11205 flags.go:52] FLAG: --version="false"
I0113 17:00:58.196095   11205 flags.go:52] FLAG: --vmodule=""
I0113 17:00:58.196259   11205 galaxy.go:77] Json Config: {
  "NetworkConf":[
    {"name":"tke-route-eni","type":"tke-route-eni","eni":"eth1","routeTable":1},
    {"name":"galaxy-flannel","type":"galaxy-flannel", "delegate":{"type":"galaxy-veth"},"subnetFile":"/run/flannel/subnet.env"},
    {"name":"galaxy-k8s-vlan","type":"galaxy-k8s-vlan", "device":"eth1", "default_bridge_name": "br0"},
    {"name":"galaxy-k8s-sriov","type": "galaxy-k8s-sriov", "device": "eth1", "vf_num": 10}
  ],
  "DefaultNetworks": ["galaxy-flannel"],
  "ENIIPNetwork": "galaxy-k8s-vlan"
}
I0113 17:00:58.198627   11205 iptables.go:218] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
W0113 17:00:58.198661   11205 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0113 17:00:58.198876   11205 galaxy.go:159] QPS: 1.000000e+03, Burst: 2000
I0113 17:00:58.200330   11205 galaxy.go:165] apiserver address https://172.31.0.1:443
I0113 17:00:58.258646   11205 portmapping.go:122] listening to tcp 10027
I0113 17:00:58.258666   11205 portmapping.go:138] Opened local port tcp:10027

And when creating a test pod, it failed. using the describe command:

  Warning  FailedCreatePodSandBox  20m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "823ddaf206a4e7dc81a8c978d19ca02dc509cffadc05db016f73886eab88c55b" network for pod "test-metallb-dpl-7fb5cc5679-hnqkq": networkPlugin cni failed to set up pod "test-metallb-dpl-7fb5cc5679-hnqkq_default" network: missing network name:
  Normal   SandboxChanged          10m (x252 over 20m)  kubelet            Pod sandbox changed, it will be killed and re-created.
  Warning  FailedCreatePodSandBox  49s (x506 over 20m)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "eef210c085fb8cde7397612098c735a57ea20122f5039a488744b32ead1a66f7" network for pod "test-metallb-dpl-7fb5cc5679-hnqkq": networkPlugin cni failed to set up pod "test-metallb-dpl-7fb5cc5679-hnqkq_default" network: missing network name:

From the galaxy logs, it seems kubelet didn't call galaxy for setting up network for the pod, otherwise galaxy will print a log at https://github.com/tkestack/galaxy/blob/v1.0.7/pkg/galaxy/server.go#L114 . Do you have any other cni plugins installed? @currycan can you show us the output of

for i in `ls /etc/cni/net.d/`; do echo $i; cat /etc/cni/net.d/$i; done

@chenchun Hello, executing the command, it returns as follow:

# for i in `ls /etc/cni/net.d/`; do echo $i; cat /etc/cni/net.d/$i; done
00-galaxy.conf
{
  "type": "galaxy-sdn",
  "capabilities": {"portMappings": true},
  "cniVersion": "0.2.0"
}
10-flannel.conflist
{
  "name": "cbr0",
  "cniVersion":"0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "forceAddress": true,
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

You can move away 10-flannel.conflist and have a try again. I believe the problem should be resolved.

You can move away 10-flannel.conflist and have a try again. I believe the problem should be resolved.

After removed the 10-flannel.conflist file, and then recreating the galaxy. There is nothing changes as the logs.

# for i in `ls /etc/cni/net.d/`; do echo $i; cat /etc/cni/net.d/$i; done
00-galaxy.conf
{
  "type": "galaxy-sdn",
  "capabilities": {"portMappings": true},
  "cniVersion": "0.2.0"
}

@currycan Can you add a "name": "galaxy-sdn", into 00-galaxy.conf and have a try again?

@chenchun It works now! Appreciated for your help.
BTW, if using the underlay network, how can I config it? The service out of the cluster can access the service in the cluster with pod IP

Galaxy doesn't support auto register subnets to the switch via BGP or any other protocol.
So first, you need to configure a network subnet on the switch for pod to use manually. But if that is not possible, pods may also use any none allocated ips of machine subnet.

Then all you have to do is to figure out the relation ship of node subnet to pod subnet, e.g. which pod subnet can be used in which node subnet, and make a floatingip configmap and starts galaxy-ipam.

floatingip-config is

kind: ConfigMap
apiVersion: v1
metadata:
  name: floatingip-config
  namespace: kube-system
data:
  floatingips: '[{"nodeSubnets":["10.177.140.0/22"],"ips":["10.177.140.40~10.177.140.80"],"subnet":"10.177.140.0/22","gateway":"10.177.143.254/22"}]'

galaxy-ipam-etc is

apiVersion: v1
kind: ConfigMap
metadata:
  name: galaxy-ipam-etc
  namespace: kube-system
data:
  # delete cloudProviderGrpcAddr if not ENI
  galaxy-ipam.json: |
    {
      "schedule_plugin": {
      }
    }

the galaxy-ipam deploymnet manifest file follow the guide. erros logs

I0115 14:31:20.821374       1 flags.go:52] FLAG: --add-dir-header="false"
I0115 14:31:20.821438       1 flags.go:52] FLAG: --alsologtostderr="false"
I0115 14:31:20.821443       1 flags.go:52] FLAG: --api-port="9041"
I0115 14:31:20.821450       1 flags.go:52] FLAG: --bind="0.0.0.0"
I0115 14:31:20.821456       1 flags.go:52] FLAG: --config="/etc/galaxy/galaxy-ipam.json"
I0115 14:31:20.821462       1 flags.go:52] FLAG: --kubeconfig=""
I0115 14:31:20.821466       1 flags.go:52] FLAG: --leader-elect="true"
I0115 14:31:20.821472       1 flags.go:52] FLAG: --leader-elect-lease-duration="15s"
I0115 14:31:20.821478       1 flags.go:52] FLAG: --leader-elect-renew-deadline="10s"
I0115 14:31:20.821483       1 flags.go:52] FLAG: --leader-elect-resource-lock="endpoints"
I0115 14:31:20.821488       1 flags.go:52] FLAG: --leader-elect-retry-period="2s"
I0115 14:31:20.821492       1 flags.go:52] FLAG: --log-backtrace-at=":0"
I0115 14:31:20.821500       1 flags.go:52] FLAG: --log-dir=""
I0115 14:31:20.821505       1 flags.go:52] FLAG: --log-file=""
I0115 14:31:20.821509       1 flags.go:52] FLAG: --log-file-max-size="1800"
I0115 14:31:20.821514       1 flags.go:52] FLAG: --log-flush-frequency="5s"
I0115 14:31:20.821518       1 flags.go:52] FLAG: --logtostderr="true"
I0115 14:31:20.821523       1 flags.go:52] FLAG: --master=""
I0115 14:31:20.821528       1 flags.go:52] FLAG: --port="9040"
I0115 14:31:20.821533       1 flags.go:52] FLAG: --profiling="true"
I0115 14:31:20.821537       1 flags.go:52] FLAG: --skip-headers="false"
I0115 14:31:20.821542       1 flags.go:52] FLAG: --skip-log-headers="false"
I0115 14:31:20.821546       1 flags.go:52] FLAG: --stderrthreshold="2"
I0115 14:31:20.821552       1 flags.go:52] FLAG: --swagger="false"
I0115 14:31:20.821558       1 flags.go:52] FLAG: --v="3"
I0115 14:31:20.821562       1 flags.go:52] FLAG: --version="false"
I0115 14:31:20.821567       1 flags.go:52] FLAG: --vmodule=""
W0115 14:31:20.821716       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0115 14:31:20.821936       1 server.go:171] QPS: 1.000000e+03, Burst: 2000
I0115 14:31:20.866203       1 server.go:192] connected to apiserver &rest.Config{Host:"https://172.31.0.1:443", APIPath:"", ContentConfig:rest.ContentConfig{AcceptContentTypes:"", ContentType:"", GroupVersion:(*schema.GroupVersion)(nil), NegotiatedSerializer:runtime.NegotiatedSerializer(nil)}, Username:"", Password:"", BearerToken:"--- REDACTED ---", BearerTokenFile:"/var/run/secrets/kubernetes.io/serviceaccount/token", Impersonate:rest.ImpersonationConfig{UserName:"", Groups:[]string(nil), Extra:map[string][]string(nil)}, AuthProvider:<nil>, AuthConfigPersister:rest.AuthProviderConfigPersister(nil), ExecProvider:<nil>, TLSClientConfig:rest.sanitizedTLSClientConfig{Insecure:false, ServerName:"", CertFile:"", KeyFile:"", CAFile:"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt", CertData:[]uint8(nil), KeyData:[]uint8(nil), CAData:[]uint8(nil)}, UserAgent:"", Transport:http.RoundTripper(nil), WrapTransport:(transport.WrapperFunc)(nil), QPS:1000, Burst:2000, RateLimiter:flowcontrol.RateLimiter(nil), Timeout:0, Dial:(func(context.Context, string, string) (net.Conn, error))(nil)}
I0115 14:31:20.907560       1 crd.go:79] Create CRD FloatingIP successfully.
I0115 14:31:20.951984       1 crd.go:79] Create CRD Pool successfully.
I0115 14:31:20.954454       1 floatingip_plugin.go:59] floating ip config: {[] 1 floatingip-config kube-system floatingips }
I0115 14:31:20.956263       1 leaderelection.go:235] attempting to acquire leader lease  kube-system/galaxy-ipam...
I0115 14:31:38.133187       1 leaderelection.go:245] successfully acquired lease kube-system/galaxy-ipam
I0115 14:31:38.133320       1 event.go:258] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"galaxy-ipam", UID:"bbcbd8c7-556f-4c3a-b905-e17a3fd59bae", APIVersion:"v1", ResourceVersion:"208175", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' k8s-node-01_ccf7d4e6-2aa5-4b14-9188-1d147f74fb1d became leader
I0115 14:31:38.133400       1 reflector.go:122] Starting reflector *v1.Pod (1m0s) from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133447       1 reflector.go:160] Listing and watching *v1.Pod from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133469       1 reflector.go:122] Starting reflector *v1.StatefulSet (1m0s) from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133517       1 reflector.go:160] Listing and watching *v1.StatefulSet from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133445       1 reflector.go:122] Starting reflector *v1alpha1.FloatingIP (0s) from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133604       1 reflector.go:160] Listing and watching *v1alpha1.FloatingIP from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133739       1 reflector.go:122] Starting reflector *v1alpha1.Pool (0s) from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133793       1 reflector.go:160] Listing and watching *v1alpha1.Pool from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133865       1 reflector.go:122] Starting reflector *v1.Deployment (1m0s) from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.133898       1 reflector.go:160] Listing and watching *v1.Deployment from pkg/mod/k8s.io/client-go@v0.0.0-20190918200256-06eb1244587a/tools/cache/reflector.go:98
I0115 14:31:38.634440       1 floatingip_plugin.go:82] empty floatingips from config, fetching from configmap
W0115 14:31:39.640118       1 floatingip_plugin.go:86] failed to unmarshal configmap val [{"nodeSubnets":["10.177.140.0/22"],"ips":["10.177.140.40~10.177.140.80"],"subnet":"10.177.140.0/22","gateway":"10.177.143.254/22"}] to floatingip config: invalid IP address: 10.177.143.254/22
W0115 14:31:40.643865       1 floatingip_plugin.go:86] failed to unmarshal configmap val [{"nodeSubnets":["10.177.140.0/22"],"ips":["10.177.140.40~10.177.140.80"],"subnet":"10.177.140.0/22","gateway":"10.177.143.254/22"}] to floatingip config: invalid IP address: 10.177.143.254/22
W0115 14:31:41.637906       1 floatingip_plugin.go:86] failed to unmarshal configmap val [{"nodeSubnets":["10.177.140.0/22"],"ips":["10.177.140.40~10.177.140.80"],"subnet":"10.177.140.0/22","gateway":"10.177.143.254/22"}] to floatingip config: invalid IP address: 10.177.143.254/22
W0115 14:31:42.639995       1 floatingip_plugin.go:86] failed to unmarshal configmap val [{"nodeSubnets":["10.177.140.0/22"],"ips":["10.177.140.40~10.177.140.80"],"subnet":"10.177.140.0/22","gateway":"10.177.143.254/22"}] to floatingip config: invalid IP address: 10.177.143.254/22

Gateway address is an IP address instead of a cidr.

I fix the gateway address. And then create a demo pod, this is the manifest file:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: common-nginx
  labels:
    app: common-nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: common-nginx
  template:
    metadata:
      name: common-nginx
      labels:
        app: common-nginx
      annotations:
        k8s.v1.cni.cncf.io/networks: "galaxy-k8s-vlan"
    spec:
      containers:
      - name: nginx
        image: registry.tcnp.com/library/nginx
        resources:
          requests:
            tke.cloud.tencent.com/eni-ip: "1"
          limits:
            tke.cloud.tencent.com/eni-ip: "1"

BUT the pod is pending, describe the pod info :

  Warning  FailedScheduling  29s   default-scheduler  0/6 nodes are available: 3 node(s) were unschedulable, 3 Insufficient tke.cloud.tencent.com/eni-ip.
  Warning  FailedScheduling  29s   default-scheduler  0/6 nodes are available: 3 node(s) were unschedulable, 3 Insufficient tke.cloud.tencent.com/eni-ip.

@currycan do you create sheduler config following https://github.com/tkestack/galaxy/blob/master/doc/galaxy-ipam-config.md#kubernetes-scheduler-configuration and make sure to update urlPrefix to the galaxy-ipam service address?
And don't forget restart kube-scheduler to make policy config work.

@chenchun This is my schedule configmap which urlPrefix is nodePort way :

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-policy
  namespace: kube-system
data:
  # set "ignoredByScheduler" to true if not ENI
  policy.cfg: |
    {
      "kind": "Policy",
      "apiVersion": "v1",
      "extenders": [
        {
          "urlPrefix": "http://10.177.140.16:32760/v1",
          "httpTimeout": 70000000000,
          "filterVerb": "filter",
          "BindVerb": "bind",
          "weight": 1,
          "enableHttps": false,
          "managedResources": [
            {
              "name": "tke.cloud.tencent.com/eni-ip",
              "ignoredByScheduler": false
            }
          ]
        }
      ]
    }

--policy-configmap="scheduler-policy" has already added to kube-scheduler and restarted it
and the kube-scheduler logs is:

[root@k8s-master-01 ~]# kubectl logs -f -n kube-system kube-scheduler-10.177.140.16
I0118 08:00:42.084178       1 flags.go:59] FLAG: --add-dir-header="false"
I0118 08:00:42.084519       1 flags.go:59] FLAG: --address="0.0.0.0"
I0118 08:00:42.084535       1 flags.go:59] FLAG: --algorithm-provider=""
I0118 08:00:42.084541       1 flags.go:59] FLAG: --alsologtostderr="true"
I0118 08:00:42.084547       1 flags.go:59] FLAG: --authentication-kubeconfig="/etc/kubernetes/scheduler.conf"
I0118 08:00:42.084553       1 flags.go:59] FLAG: --authentication-skip-lookup="false"
I0118 08:00:42.084561       1 flags.go:59] FLAG: --authentication-token-webhook-cache-ttl="10s"
I0118 08:00:42.084569       1 flags.go:59] FLAG: --authentication-tolerate-lookup-failure="true"
I0118 08:00:42.084574       1 flags.go:59] FLAG: --authorization-always-allow-paths="[/healthz]"
I0118 08:00:42.084584       1 flags.go:59] FLAG: --authorization-kubeconfig="/etc/kubernetes/scheduler.conf"
I0118 08:00:42.084591       1 flags.go:59] FLAG: --authorization-webhook-cache-authorized-ttl="10s"
I0118 08:00:42.084597       1 flags.go:59] FLAG: --authorization-webhook-cache-unauthorized-ttl="10s"
I0118 08:00:42.084603       1 flags.go:59] FLAG: --bind-address="127.0.0.1"
I0118 08:00:42.084612       1 flags.go:59] FLAG: --cert-dir=""
I0118 08:00:42.084618       1 flags.go:59] FLAG: --client-ca-file=""
I0118 08:00:42.084625       1 flags.go:59] FLAG: --config=""
I0118 08:00:42.084630       1 flags.go:59] FLAG: --contention-profiling="true"
I0118 08:00:42.084637       1 flags.go:59] FLAG: --experimental-logging-sanitization="false"
I0118 08:00:42.084643       1 flags.go:59] FLAG: --feature-gates=""
I0118 08:00:42.084651       1 flags.go:59] FLAG: --hard-pod-affinity-symmetric-weight="1"
I0118 08:00:42.084659       1 flags.go:59] FLAG: --help="false"
I0118 08:00:42.084665       1 flags.go:59] FLAG: --http2-max-streams-per-connection="0"
I0118 08:00:42.084673       1 flags.go:59] FLAG: --kube-api-burst="200"
I0118 08:00:42.084679       1 flags.go:59] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0118 08:00:42.084689       1 flags.go:59] FLAG: --kube-api-qps="500"
I0118 08:00:42.084697       1 flags.go:59] FLAG: --kubeconfig="/etc/kubernetes/scheduler.conf"
I0118 08:00:42.084705       1 flags.go:59] FLAG: --leader-elect="true"
I0118 08:00:42.084711       1 flags.go:59] FLAG: --leader-elect-lease-duration="15s"
I0118 08:00:42.084717       1 flags.go:59] FLAG: --leader-elect-renew-deadline="10s"
I0118 08:00:42.084727       1 flags.go:59] FLAG: --leader-elect-resource-lock="leases"
I0118 08:00:42.084733       1 flags.go:59] FLAG: --leader-elect-resource-name="kube-scheduler"
I0118 08:00:42.084739       1 flags.go:59] FLAG: --leader-elect-resource-namespace="kube-system"
I0118 08:00:42.084745       1 flags.go:59] FLAG: --leader-elect-retry-period="2s"
I0118 08:00:42.084751       1 flags.go:59] FLAG: --lock-object-name="kube-scheduler"
I0118 08:00:42.084757       1 flags.go:59] FLAG: --lock-object-namespace="kube-system"
I0118 08:00:42.084764       1 flags.go:59] FLAG: --log-backtrace-at=":0"
I0118 08:00:42.084773       1 flags.go:59] FLAG: --log-dir="/var/log/kubernetes/kube-scheduler"
I0118 08:00:42.084781       1 flags.go:59] FLAG: --log-file=""
I0118 08:00:42.084786       1 flags.go:59] FLAG: --log-file-max-size="1800"
I0118 08:00:42.084793       1 flags.go:59] FLAG: --log-flush-frequency="5s"
I0118 08:00:42.084799       1 flags.go:59] FLAG: --logging-format="text"
I0118 08:00:42.084805       1 flags.go:59] FLAG: --logtostderr="false"
I0118 08:00:42.084811       1 flags.go:59] FLAG: --master=""
I0118 08:00:42.084817       1 flags.go:59] FLAG: --one-output="false"
I0118 08:00:42.084823       1 flags.go:59] FLAG: --permit-port-sharing="false"
I0118 08:00:42.084829       1 flags.go:59] FLAG: --policy-config-file=""
I0118 08:00:42.084835       1 flags.go:59] FLAG: --policy-configmap="scheduler-policy"
I0118 08:00:42.084841       1 flags.go:59] FLAG: --policy-configmap-namespace="kube-system"
I0118 08:00:42.084847       1 flags.go:59] FLAG: --port="10251"
I0118 08:00:42.084854       1 flags.go:59] FLAG: --profiling="false"
I0118 08:00:42.084860       1 flags.go:59] FLAG: --requestheader-allowed-names="[]"
I0118 08:00:42.084873       1 flags.go:59] FLAG: --requestheader-client-ca-file=""
I0118 08:00:42.084879       1 flags.go:59] FLAG: --requestheader-extra-headers-prefix="[x-remote-extra-]"
I0118 08:00:42.084887       1 flags.go:59] FLAG: --requestheader-group-headers="[x-remote-group]"
I0118 08:00:42.084897       1 flags.go:59] FLAG: --requestheader-username-headers="[x-remote-user]"
I0118 08:00:42.084904       1 flags.go:59] FLAG: --scheduler-name="default-scheduler"
I0118 08:00:42.084910       1 flags.go:59] FLAG: --secure-port="10259"
I0118 08:00:42.084916       1 flags.go:59] FLAG: --show-hidden-metrics-for-version=""
I0118 08:00:42.084922       1 flags.go:59] FLAG: --skip-headers="false"
I0118 08:00:42.084928       1 flags.go:59] FLAG: --skip-log-headers="false"
I0118 08:00:42.084934       1 flags.go:59] FLAG: --stderrthreshold="2"
I0118 08:00:42.084941       1 flags.go:59] FLAG: --tls-cert-file=""
I0118 08:00:42.084947       1 flags.go:59] FLAG: --tls-cipher-suites="[]"
I0118 08:00:42.084958       1 flags.go:59] FLAG: --tls-min-version=""
I0118 08:00:42.084964       1 flags.go:59] FLAG: --tls-private-key-file=""
I0118 08:00:42.084970       1 flags.go:59] FLAG: --tls-sni-cert-key="[]"
I0118 08:00:42.084978       1 flags.go:59] FLAG: --use-legacy-policy-config="false"
I0118 08:00:42.084984       1 flags.go:59] FLAG: --v="2"
I0118 08:00:42.084991       1 flags.go:59] FLAG: --version="false"
I0118 08:00:42.085001       1 flags.go:59] FLAG: --vmodule=""
I0118 08:00:42.085008       1 flags.go:59] FLAG: --write-config-to=""
I0118 08:00:43.444542       1 serving.go:331] Generated self-signed cert in-memory
I0118 08:00:45.914304       1 requestheader_controller.go:244] Loaded a new request header values for RequestHeaderAuthRequestController
I0118 08:00:46.043083       1 factory.go:210] Creating scheduler from configuration: {{ } [] [] [{http://10.177.140.16:32760/v1 filter   1 bind false <nil> {1m10s} false [{tke.cloud.tencent.com/eni-ip false}] false}] 0 false}
I0118 08:00:46.043232       1 factory.go:219] Using predicates from algorithm provider 'DefaultProvider'
I0118 08:00:46.043256       1 factory.go:230] Using default priorities
I0118 08:00:46.043269       1 factory.go:257] Creating scheduler with fit predicates 'map[CheckNodeUnschedulable:{} CheckVolumeBinding:{} EvenPodsSpread:{} GeneralPredicates:{} MatchInterPodAffinity:{} MaxAzureDiskVolumeCount:{} MaxCSIVolumeCountPred:{} MaxEBSVolumeCount:{} MaxGCEPDVolumeCount:{} NoDiskConflict:{} NoVolumeZoneConflict:{} PodToleratesNodeTaints:{}]' and priority functions 'map[BalancedResourceAllocation:1 EvenPodsSpreadPriority:2 ImageLocalityPriority:1 InterPodAffinityPriority:1 LeastRequestedPriority:1 NodeAffinityPriority:1 NodePreferAvoidPodsPriority:10000 SelectorSpreadPriority:1 TaintTolerationPriority:1]'

@currycan can you change ignoredByScheduler to true so that tke.cloud.tencent.com/eni-ip resource won't be judged by kube-scheduler.

I'm sorry to tell you it doesn't work. This is the configmap:

[root@k8s-master-01 galaxy]# kubectl get cm -n kube-system scheduler-policy -o yaml
apiVersion: v1
data:
  policy.cfg: |
    {
      "kind": "Policy",
      "apiVersion": "v1",
      "extenders": [
        {
          "urlPrefix": "http://10.177.140.16:32760/v1",
          "httpTimeout": 70000000000,
          "filterVerb": "filter",
          "BindVerb": "bind",
          "weight": 1,
          "enableHttps": false,
          "managedResources": [
            {
              "name": "tke.cloud.tencent.com/eni-ip",
              "ignoredByScheduler": true
            }
          ]
        }
      ]
    }
kind: ConfigMap

After changing ignoredByScheduler to true, I restart the kubelet service and recreate the kube-scheduler pod

I found some other info about galax in http://www.iceyao.com.cn/2020/07/03/galaxy_source_code_readnote/, and then change the galaxy-etc as follow:

# kubectl -n kube-system get cm galaxy-etc -o yaml
apiVersion: v1
data:
  galaxy.json: |
    {
      "NetworkConf":[
        {"name":"tke-route-eni","type":"tke-route-eni","eni":"eth1","routeTable":1},
        {"name":"galaxy-flannel","type":"galaxy-flannel", "delegate":{"type":"galaxy-veth"},"subnetFile":"/run/flannel/subnet.env"},
        {"name":"galaxy-k8s-vlan","type":"galaxy-k8s-vlan", "device":"eth0", "switch":"ipvlan", "ipvlan_mode":"l2"},
        {"name":"galaxy-k8s-sriov","type": "galaxy-k8s-sriov", "device": "eth0", "vf_num": 10}
      ],
      "DefaultNetworks": ["galaxy-k8s-vlan"],
      "ENIIPNetwork": "galaxy-k8s-vlan"
    }
kind: ConfigMap

describe the pod info:

  Warning  FailedCreatePodSandBox  6m23s                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "38416c5c992a971c90f5765fe74c6d204d37b7a7752c3a243d539a1231faccbc" network for pod "common-nginx-mkk62": networkPlugin cni failed to set up pod "common-nginx-mkk62_default" network: galaxy returns: fail to establish network map[ipinfos:[{"ip":"10.177.140.46/22","vlan":0,"gateway":"10.177.143.254"}]]:failed to setup bridge Error getting device eth0: Link not found
  Warning  FailedCreatePodSandBox  6m14s (x4 over 6m21s)  kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f2a4de16b9341ff4c6e08b263246a9a9353c89b302653097dd010ff2c8d124a3" network for pod "common-nginx-mkk62": networkPlugin cni failed to set up pod "common-nginx-mkk62_default" network: galaxy returns: fail to establish network map[ipinfos:[{"ip":"10.177.140.46/22","vlan":0,"gateway":"10.177.143.254"}]]:failed to setup bridge Error getting device eth0: Link not found

@chenchun I got it. The eth device is ens192 not eth0 or eth1. Thank you very much!

@chenchun I found something wrong with the probe of health check.
manifest file:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-floatingip
spec:
  strategy:
    type: Recreate
  replicas: 3
  selector:
    matchLabels:
      app: nginx-floatingip
  template:
    metadata:
      name: nginx-floatingip
      labels:
        app: nginx-floatingip
      annotations:
        k8s.v1.cni.cncf.io/networks: "galaxy-k8s-vlan"
        k8s.v1.cni.galaxy.io/release-policy: "immutable"
    spec:
      tolerations:
        - operator: "Exists"
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
          - name: http-80
            containerPort: 80
        resources:
          requests:
            cpu: "0.1"
            memory: "32Mi"
            tke.cloud.tencent.com/eni-ip: "1"
          limits:
            cpu: "0.1"
            memory: "32Mi"
            tke.cloud.tencent.com/eni-ip: "1"
        livenessProbe:
          # httpGet:
          #   path: /
          #   port: 80
          #   scheme: HTTP
          tcpSocket:
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3
          timeoutSeconds: 1
        readinessProbe:
          # httpGet:
          #   path: /
          #   port: 80
          #   scheme: HTTP
          tcpSocket:
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 2
          failureThreshold: 3
          timeoutSeconds: 1

health check failed in both tcpSocket and httpGet:

  Warning  FailedScheduling  82s                default-scheduler  deployment nginx-floatingip has allocated 3 ips with replicas of 3, wait for releasing
  Warning  FailedScheduling  82s                default-scheduler  deployment nginx-floatingip has allocated 3 ips with replicas of 3, wait for releasing
  Normal   Scheduled         78s                default-scheduler  Successfully assigned default/nginx-floatingip-5cdcd7bcbd-6ql2x to 10.177.140.18
  Warning  Unhealthy         16s (x3 over 36s)  kubelet            Liveness probe failed: dial tcp 10.177.140.44:80: i/o timeout