lework/kainstall

ubuntu 20.04 安装 metrics-server 未正常运行,集群安装卡在 kubesphere

TaibiaoGuo opened this issue · 2 comments

metrics-server 未正常运行,集群安装卡在 kubesphere

[2021-08-08T21:38:49.956194608+0800]: INFO:    [apply] add /tmp/kainstall-offline-file//manifests/kubesphere-installer.yaml succeeded.
[2021-08-08T21:38:49.963119829+0800]: INFO:    [apply] /tmp/kainstall-offline-file//manifests/cluster-configuration.yaml
[2021-08-08T21:38:52.437071649+0800]: INFO:    [apply] add /tmp/kainstall-offline-file//manifests/cluster-configuration.yaml succeeded.
[2021-08-08T21:39:55.449887893+0800]: INFO:    [waiting] waiting ks-installer
[2021-08-08T21:40:01.944030259+0800]: INFO:    [waiting] ks-installer pods ready succeeded.

节点信息

Information as of: 2021-08-08 13:15:24
 
 Product............: VMware Virtual Platform None
 OS.................: Ubuntu 20.04.1 LTS (bullseye/sid)
 Kernel.............: Linux 5.4.0-80-generic x86_64 GNU/Linux
 CPU................: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 6P 1C 6L

 Hostname...........: k8s-master-node1
 IP Addresses.......: xxx.xxx.xxx.1

 Uptime.............: 0 days, 00h 00m 12s
 Memory.............: 0.61GiB of 7.75GiB RAM used (7.91%)
 Load Averages......: 0.07 / 0.02 / 0.00 with 6 core(s) at 2394.374Hz
 Disk Usage.........: 13G of 1.2T disk space used (2%) 

 Users online.......: 1
 Running Processes..: 309
 Container Info.....: Images:0

集群初始化命令

bash -c "$(curl -sSL https://cdn.jsdelivr.net/gh/lework/kainstall@master/kainstall-ubuntu.sh)" - init  \
 --master xxxx.xxxx.xxx.1  \
 --worker xxxx.xxxx.xxx.2, xxxx.xxxx.xxx.3  \
 --user root   --password zzzzzzz  
--10years --version 1.21.3  \
--network flannel --ingress nginx --ui kubesphere --addon metrics-server --monitor prometheus

metrics-server-79bf7dcc6f-wmbj9 not ready

[root@k8s-master-node1 /]# kubectl get pod -A
NAMESPACE           NAME                                        READY   STATUS             RESTARTS   AGE
default             ingress-demo-app-694bf5d965-6rqds           1/1     Running            0          30m
default             ingress-demo-app-694bf5d965-nkdvb           1/1     Running            0          30m
ingress-nginx       ingress-nginx-admission-create-r857p        0/1     Completed          0          31m
ingress-nginx       ingress-nginx-admission-patch-tkxp5         0/1     Completed          0          31m
ingress-nginx       ingress-nginx-controller-76d9d9fbf5-n5jxf   1/1     Running            0          31m
kube-system         coredns-56c5f6b585-2422r                    1/1     Running            0          32m
kube-system         coredns-56c5f6b585-srp4j                    1/1     Running            0          32m
kube-system         default-http-backend-6c67944995-fpmcq       1/1     Running            0          30m
kube-system         etcd-k8s-master-node1                       1/1     Running            0          32m
kube-system         kube-apiserver-k8s-master-node1             1/1     Running            0          32m
kube-system         kube-controller-manager-k8s-master-node1    1/1     Running            0          32m
kube-system         kube-flannel-ds-fh8zh                       1/1     Running            0          32m
kube-system         kube-flannel-ds-nb6kl                       1/1     Running            0          32m
kube-system         kube-flannel-ds-x78rn                       1/1     Running            0          32m
kube-system         kube-proxy-7pgps                            1/1     Running            0          32m
kube-system         kube-proxy-hnv6x                            1/1     Running            0          32m
kube-system         kube-proxy-nzpnq                            1/1     Running            0          32m
kube-system         kube-scheduler-k8s-master-node1             1/1     Running            0          32m
kube-system         metrics-server-79bf7dcc6f-wmbj9             0/1     Running            0          31m
kubesphere-system   ks-installer-ff7d7698d-bppv6                0/1     CrashLoopBackOff   9          30m

mertics-server pod 相关信息和日志

[root@k8s-master-node1 /]# kubectl logs -n kube-system   metrics-server-79bf7dcc6f-wmbj9
I0808 13:37:21.566776       1 serving.go:341] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0808 13:37:22.307172       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0808 13:37:22.307189       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0808 13:37:22.307209       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0808 13:37:22.307214       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0808 13:37:22.307226       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0808 13:37:22.307229       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0808 13:37:22.307685       1 secure_serving.go:197] Serving securely on [::]:443
I0808 13:37:22.307755       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0808 13:37:22.307775       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0808 13:37:22.407917       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I0808 13:37:22.407924       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I0808 13:37:22.407940       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
E0808 13:37:27.504502       1 scraper.go:139] "Failed to scrape node" err="Get \"https://k8s-worker-node2:10250/stats/summary?only_cpu_and_memory=true\": EOF" node="k8s-worker-node2"
E0808 13:37:27.522212       1 scraper.go:139] "Failed to scrape node" err="Get \"https://k8s-master-node1:10250/stats/summary?only_cpu_and_memory=true\": EOF" node="k8s-master-node1"
[root@k8s-master-node1 /]# kubectl describe -n kube-system pod  metrics-server-79bf7dcc6f-wmbj9
Name:                 metrics-server-79bf7dcc6f-wmbj9
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 k8s-worker-node2/xxx.xxx.xxx.xxx
Start Time:           Sun, 08 Aug 2021 13:37:16 +0000
Labels:               k8s-app=metrics-server
                      pod-template-hash=79bf7dcc6f
Annotations:          <none>
Status:               Running
IP:                   10.244.2.2
IPs:
  IP:           10.244.2.2
Controlled By:  ReplicaSet/metrics-server-79bf7dcc6f
Containers:
  metrics-server:
    Container ID:  docker://10e9a176e588a454066608bdbec5adddd39de942ee771c62a6f99e7c079e68a0
    Image:         registry.cn-hangzhou.aliyuncs.com/kainstall/metrics-server:v0.5.0
    Image ID:      docker-pullable://registry.cn-hangzhou.aliyuncs.com/kainstall/metrics-server@sha256:05bf9f4bf8d9de19da59d3e1543fd5c140a8d42a5e1b92421e36e5c2d74395eb
    Port:          443/TCP
    Host Port:     0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=443
      --kubelet-use-node-status-port
      --metric-resolution=15s
    State:          Running
      Started:      Sun, 08 Aug 2021 13:37:21 +0000
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
      memory:     200Mi
    Liveness:     http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nnnnm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-nnnnm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  4m59s                 default-scheduler  Successfully assigned kube-system/metrics-server-79bf7dcc6f-wmbj9 to k8s-worker-node2
  Normal   Pulling    4m57s                 kubelet            Pulling image "registry.cn-hangzhou.aliyuncs.com/kainstall/metrics-server:v0.5.0"
  Normal   Pulled     4m54s                 kubelet            Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/kainstall/metrics-server:v0.5.0" in 3.841978824s
  Normal   Created    4m53s                 kubelet            Created container metrics-server
  Normal   Started    4m53s                 kubelet            Started container metrics-server
  Warning  Unhealthy  68s (x21 over 4m28s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500

安装中的报错

[2021-08-08T21:36:14.184345230+0800]: ^[[32mINFO:    ^[[0m[kubeadm init] xxx.xxx.xxx.xxx: set kube config succeeded.
[2021-08-08T21:36:14.196881865+0800]: ^[[32mINFO:    ^[[0m[kubeadm init] xxx.xxx.xxx.xxx: delete master taint
[2021-08-08T21:36:14.223645005+0800]: ^[[34mEXEC:    ^[[0m[command] bash -c 'kubectl taint nodes --all node-role.kubernetes.io/master-'
bash: kubectl: command not found
[2021-08-08T21:36:14.237175207+0800]: ^[[31mERROR:   ^[[0m[kubeadm init] xxx.xxx.xxx.xxx: delete master taint failed.

目前的解决办法:下载yaml文件,添加 137行的 - --kubelet-insecure-tls配置项跳过证书验证,kubectl apply 重新部署,即可正常运行。

wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml
129     spec:
130       containers:
131       - args:
132         - --cert-dir=/tmp
133         - --secure-port=443
134         - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
135         - --kubelet-use-node-status-port
136         - --metric-resolution=15s
137         - --kubelet-insecure-tls
kubectl apply -f  components.yaml

感谢提醒,metrics-server v0.5.0 版本修改了secure-port,脚本中没兼容这个变量,导致--kubelet-insecure-tls参数没有加上。现已修复。

kainstall/kainstall-ubuntu.sh

Lines 2536 to 2539 in 61254ac

sed -i -e 's#k8s.gcr.io/metrics-server#$KUBE_IMAGE_REPO#g' \
-e '/--kubelet-preferred-address-types=.*/d' \
-e 's/\\(.*\\)- --secure-port=\\(.*\\)/\\1- --secure-port=\\2\\n\\1- --kubelet-insecure-tls\\n\\1- --kubelet-preferred-address-types=InternalIP,InternalDNS,ExternalIP,ExternalDNS,Hostname/g' \
\"${metrics_server_file}\"

kubesphere 安装 需要 storage 的支持,所以需要先安装 longhorn 或 rook。