ubuntu 20.04 安装 metrics-server 未正常运行,集群安装卡在 kubesphere
TaibiaoGuo opened this issue · 2 comments
TaibiaoGuo commented
metrics-server 未正常运行,集群安装卡在 kubesphere
[2021-08-08T21:38:49.956194608+0800]: INFO: [apply] add /tmp/kainstall-offline-file//manifests/kubesphere-installer.yaml succeeded.
[2021-08-08T21:38:49.963119829+0800]: INFO: [apply] /tmp/kainstall-offline-file//manifests/cluster-configuration.yaml
[2021-08-08T21:38:52.437071649+0800]: INFO: [apply] add /tmp/kainstall-offline-file//manifests/cluster-configuration.yaml succeeded.
[2021-08-08T21:39:55.449887893+0800]: INFO: [waiting] waiting ks-installer
[2021-08-08T21:40:01.944030259+0800]: INFO: [waiting] ks-installer pods ready succeeded.
节点信息
Information as of: 2021-08-08 13:15:24
Product............: VMware Virtual Platform None
OS.................: Ubuntu 20.04.1 LTS (bullseye/sid)
Kernel.............: Linux 5.4.0-80-generic x86_64 GNU/Linux
CPU................: Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz 6P 1C 6L
Hostname...........: k8s-master-node1
IP Addresses.......: xxx.xxx.xxx.1
Uptime.............: 0 days, 00h 00m 12s
Memory.............: 0.61GiB of 7.75GiB RAM used (7.91%)
Load Averages......: 0.07 / 0.02 / 0.00 with 6 core(s) at 2394.374Hz
Disk Usage.........: 13G of 1.2T disk space used (2%)
Users online.......: 1
Running Processes..: 309
Container Info.....: Images:0
集群初始化命令
bash -c "$(curl -sSL https://cdn.jsdelivr.net/gh/lework/kainstall@master/kainstall-ubuntu.sh)" - init \
--master xxxx.xxxx.xxx.1 \
--worker xxxx.xxxx.xxx.2, xxxx.xxxx.xxx.3 \
--user root --password zzzzzzz
--10years --version 1.21.3 \
--network flannel --ingress nginx --ui kubesphere --addon metrics-server --monitor prometheus
metrics-server-79bf7dcc6f-wmbj9 not ready
[root@k8s-master-node1 /]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default ingress-demo-app-694bf5d965-6rqds 1/1 Running 0 30m
default ingress-demo-app-694bf5d965-nkdvb 1/1 Running 0 30m
ingress-nginx ingress-nginx-admission-create-r857p 0/1 Completed 0 31m
ingress-nginx ingress-nginx-admission-patch-tkxp5 0/1 Completed 0 31m
ingress-nginx ingress-nginx-controller-76d9d9fbf5-n5jxf 1/1 Running 0 31m
kube-system coredns-56c5f6b585-2422r 1/1 Running 0 32m
kube-system coredns-56c5f6b585-srp4j 1/1 Running 0 32m
kube-system default-http-backend-6c67944995-fpmcq 1/1 Running 0 30m
kube-system etcd-k8s-master-node1 1/1 Running 0 32m
kube-system kube-apiserver-k8s-master-node1 1/1 Running 0 32m
kube-system kube-controller-manager-k8s-master-node1 1/1 Running 0 32m
kube-system kube-flannel-ds-fh8zh 1/1 Running 0 32m
kube-system kube-flannel-ds-nb6kl 1/1 Running 0 32m
kube-system kube-flannel-ds-x78rn 1/1 Running 0 32m
kube-system kube-proxy-7pgps 1/1 Running 0 32m
kube-system kube-proxy-hnv6x 1/1 Running 0 32m
kube-system kube-proxy-nzpnq 1/1 Running 0 32m
kube-system kube-scheduler-k8s-master-node1 1/1 Running 0 32m
kube-system metrics-server-79bf7dcc6f-wmbj9 0/1 Running 0 31m
kubesphere-system ks-installer-ff7d7698d-bppv6 0/1 CrashLoopBackOff 9 30m
mertics-server pod 相关信息和日志
[root@k8s-master-node1 /]# kubectl logs -n kube-system metrics-server-79bf7dcc6f-wmbj9
I0808 13:37:21.566776 1 serving.go:341] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0808 13:37:22.307172 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0808 13:37:22.307189 1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0808 13:37:22.307209 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0808 13:37:22.307214 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0808 13:37:22.307226 1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0808 13:37:22.307229 1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0808 13:37:22.307685 1 secure_serving.go:197] Serving securely on [::]:443
I0808 13:37:22.307755 1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0808 13:37:22.307775 1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0808 13:37:22.407917 1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController
I0808 13:37:22.407924 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0808 13:37:22.407940 1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E0808 13:37:27.504502 1 scraper.go:139] "Failed to scrape node" err="Get \"https://k8s-worker-node2:10250/stats/summary?only_cpu_and_memory=true\": EOF" node="k8s-worker-node2"
E0808 13:37:27.522212 1 scraper.go:139] "Failed to scrape node" err="Get \"https://k8s-master-node1:10250/stats/summary?only_cpu_and_memory=true\": EOF" node="k8s-master-node1"
[root@k8s-master-node1 /]# kubectl describe -n kube-system pod metrics-server-79bf7dcc6f-wmbj9
Name: metrics-server-79bf7dcc6f-wmbj9
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: k8s-worker-node2/xxx.xxx.xxx.xxx
Start Time: Sun, 08 Aug 2021 13:37:16 +0000
Labels: k8s-app=metrics-server
pod-template-hash=79bf7dcc6f
Annotations: <none>
Status: Running
IP: 10.244.2.2
IPs:
IP: 10.244.2.2
Controlled By: ReplicaSet/metrics-server-79bf7dcc6f
Containers:
metrics-server:
Container ID: docker://10e9a176e588a454066608bdbec5adddd39de942ee771c62a6f99e7c079e68a0
Image: registry.cn-hangzhou.aliyuncs.com/kainstall/metrics-server:v0.5.0
Image ID: docker-pullable://registry.cn-hangzhou.aliyuncs.com/kainstall/metrics-server@sha256:05bf9f4bf8d9de19da59d3e1543fd5c140a8d42a5e1b92421e36e5c2d74395eb
Port: 443/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=443
--kubelet-use-node-status-port
--metric-resolution=15s
State: Running
Started: Sun, 08 Aug 2021 13:37:21 +0000
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nnnnm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-nnnnm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m59s default-scheduler Successfully assigned kube-system/metrics-server-79bf7dcc6f-wmbj9 to k8s-worker-node2
Normal Pulling 4m57s kubelet Pulling image "registry.cn-hangzhou.aliyuncs.com/kainstall/metrics-server:v0.5.0"
Normal Pulled 4m54s kubelet Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/kainstall/metrics-server:v0.5.0" in 3.841978824s
Normal Created 4m53s kubelet Created container metrics-server
Normal Started 4m53s kubelet Started container metrics-server
Warning Unhealthy 68s (x21 over 4m28s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
安装中的报错
[2021-08-08T21:36:14.184345230+0800]: ^[[32mINFO: ^[[0m[kubeadm init] xxx.xxx.xxx.xxx: set kube config succeeded.
[2021-08-08T21:36:14.196881865+0800]: ^[[32mINFO: ^[[0m[kubeadm init] xxx.xxx.xxx.xxx: delete master taint
[2021-08-08T21:36:14.223645005+0800]: ^[[34mEXEC: ^[[0m[command] bash -c 'kubectl taint nodes --all node-role.kubernetes.io/master-'
bash: kubectl: command not found
[2021-08-08T21:36:14.237175207+0800]: ^[[31mERROR: ^[[0m[kubeadm init] xxx.xxx.xxx.xxx: delete master taint failed.
TaibiaoGuo commented
目前的解决办法:下载yaml文件,添加 137行的 - --kubelet-insecure-tls
配置项跳过证书验证,kubectl apply 重新部署,即可正常运行。
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml
129 spec:
130 containers:
131 - args:
132 - --cert-dir=/tmp
133 - --secure-port=443
134 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
135 - --kubelet-use-node-status-port
136 - --metric-resolution=15s
137 - --kubelet-insecure-tls
kubectl apply -f components.yaml