kubernetes-sigs/metrics-server

metrics-server unable to authenticate to apiserver

beanssoft opened this issue · 17 comments

Hello,

I just been trying to install metrics-server 1.8 over k8s 1.14. I've followed the standard instructions:

$ git clone https://github.com/kubernetes-incubator/metrics-server.git
$ kubectl create -f metrics-server/deploy/1.8+/

But it always loop in an error state:

metrics-server-58dfcc7fcc-lsrgw 0/1 CrashLoopBackOff 5 8m21s

looking at the logs I see the next:

0614 22:38:04.236395 1 serving.go:312] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
Error: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout
panic: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout

I guess that this issue is maybe related to a fw rules but I'm not sure.

I'm using Calico

Maybe the next info could be useful:

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
calico-typha ClusterIP 10.108.145.124 5473/TCP 66d
kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 66d
metrics-server ClusterIP 10.97.65.85 443/TCP 81m
traefik-ingress-service ClusterIP 10.99.53.245 80/TCP,8080/TCP 4d1h

Thanks in advance for your help

Anyone that could please help?

There is some one there?. After searching a loot I´ve found that this metrics server version is not working with kubernetes 1.14. I've tested the standard solution:

command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP

But is does't work to me, I still can't access to the data on my node workers:

$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
masterkubernetes.enova.mx 592m 14% 811Mi 21%
kubernetesworker1.enova.mx
kubernetesworker2

This is the error I can see at the logs:

E0619 17:46:11.232435 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kubernetesworker2: unable to fetch metrics from Kubelet kubernetesworker2 (192.168.137.112): Get https://192.168.137.112:10250/stats/summary/: dial tcp 192.168.137.112:10250: i/o timeout, unable to fully scrape metrics from source kubelet_summary:kubernetesworker1.enova.mx: unable to fetch metrics from Kubelet kubernetesworker1.enova.mx (192.168.137.111): Get https://192.168.137.111:10250/stats/summary/: dial tcp 192.168.137.111:10250: i/o timeout]
E0619 17:46:37.457482 1 reststorage.go:128] unable to fetch node metrics for node "kubernetesworker2": no metrics known for node
E0619 17:46:37.457595 1 reststorage.go:128] unable to fetch node metrics for node "kubernetesworker1.enova.mx": no metrics known for node

The port is open in both nodes(10250), I can access directly, but the metric server can't. This seems to be a problem related with permissions but, I don't know where to do that configuration.

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetesworker1.enova.mx Ready 71d v1.14.0
kubernetesworker2 Ready 71d v1.14.1
masterkubernetes.enova.mx Ready master 71d v1.14.0

Thanks in advance for their help.

Have you solved the problem?
I found the same problem as yours, here's my log:

E0630 13:52:42.999850       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape
metrics from source kubelet_summary:slave1: unable to fetch metrics from Kubelet slave1 (slave1): Get 
https://slave1:10250/stats/summary/: dial tcp: lookup slave1 on 169.254.25.10:53: no such host, unable 
to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet 
master (master): Get https://master:10250/stats/summary/: dial tcp: lookup master on 
169.254.25.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:slave2: 
unable to fetch metrics from Kubelet slave2 (slave2): Get https://slave2:10250/stats/summary/: dial 
tcp: lookup slave2 on 169.254.25.10:53: no such host, unable to fully scrape metrics from source 
kubelet_summary:slave3: unable to fetch metrics from Kubelet slave3 (slave3): Get 
https://slave3:10250/stats/summary/: dial tcp: lookup slave3 on 169.254.25.10:53: no such host]

My k8s version is v1.14, and my metrics server version is v0.3.3

Have you solved the problem?
I found the same problem as yours, here's my log:

E0630 13:52:42.999850       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape
metrics from source kubelet_summary:slave1: unable to fetch metrics from Kubelet slave1 (slave1): Get 
https://slave1:10250/stats/summary/: dial tcp: lookup slave1 on 169.254.25.10:53: no such host, unable 
to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet 
master (master): Get https://master:10250/stats/summary/: dial tcp: lookup master on 
169.254.25.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:slave2: 
unable to fetch metrics from Kubelet slave2 (slave2): Get https://slave2:10250/stats/summary/: dial 
tcp: lookup slave2 on 169.254.25.10:53: no such host, unable to fully scrape metrics from source 
kubelet_summary:slave3: unable to fetch metrics from Kubelet slave3 (slave3): Get 
https://slave3:10250/stats/summary/: dial tcp: lookup slave3 on 169.254.25.10:53: no such host]

My k8s version is v1.14, and my metrics server version is v0.3.3

Your problem is about the host name resolution, ensure that the names of the master and workers are in the /etc/hosts file

Seems that nobody is giving support here

/lifecycle stale

There are a bunch of people reporting this and it seems to be due to a whole bunch of problems.
The default setup needs the pod to be able to request metrics from kubelets directly. That needs the kubelet to support token auth, which my kops setup didn't do out of the box, it also needs the ca-cert (or tls-insecure), and the kubelets cert seems to be singed by a different CA than the one given to service account). On top of that, routing from the pod to the kubelet needs to work, and there are varous options the service can use to select which address to use.

/remove-lifecycle stale

I'm going to work on a PR to try using the API server node proxy endpoint, which I think will make the default, out of hte box, setup, much easier.
(also, I think the existing deployment manifests forget to give namespace listing rights to the service-account, which also doesn't help)

One option I don't think I've seen mentioned, as a temporary workaround:

--deprecated-kubelet-completely-insecure
--kubelet-port=10255
--kubelet-preferred-address-types=InternalIP

Thanks @tcolgate but your recommendation is not working at 1.14.4 version. Reading the doc seems that the solution is giving right access via RBAC, still reading how this can be done.

eskp commented

Had to edit Deployment object manually and add this to the container's command:

- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls

Later added to helm chart's args value.

One option I don't think I've seen mentioned, as a temporary workaround:

--deprecated-kubelet-completely-insecure
--kubelet-port=10255
--kubelet-preferred-address-types=InternalIP

Thanks !
work fine on k8s v1.12.8

After trying to install metric-server 0.36 on a three node server with the following configuration:

CentOS 8.0.1905 (Kernel 4.18.0-80.11.2.)
Kubernetes 1.16.2
Canal CNI plugin
Docker 19.03.4

I found a default installation of metrics-server (0.3.6) inoperable. I found my way here thanks to this error in the pertinent logs:

Error: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout

Sadly, eskp's findings alone did not suffice for me. With further findings, eskp's settings plus one key extra setting got my install working. I'm still quite new to Kubernetes so I'm spelling this out for anyone who stumbles across this in the hopes that it saves them some time:

  1. kubectl edit deploy -n kube-system metrics-server

Add the following four lines under spec:spec:containers.** I put it before the "image" k8s.gcr.io/metris-server-amd64:v0.3.6" line, like so:

  - args:
    - --kubelet-insecure-tls
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --metric-resolution=30s
  image: k8s.gcr.io/metrics-server-amd64:v0.3.6 # This line is included for reference and was already present

The metric-resolution=30s line is NOT required. It just changes the refresh rate from the default of 60 seconds to 30.

  1. Add the following line at the spec:spec level outside of the previous containers level:
      hostNetwork: true
      restartPolicy: Always # This line is included for reference and was already present
  1. Save your changes and wait a bit for metrics to be gathered. kubectl top nodes and kubectl top pods should now hopefully display useful information.

Adding hostNetwork: true is what finally got metrics-server working for me. Without it, nada. Without the kubelet-preferred-address-types line, I could query my master node but not my two worker nodes, nor could I query pods, obviously undesirable results.. Lack of kubelet-insecure-tls also results in an inoperable metrics-server installation.

尝试在具有以下配置的三节点服务器上安装metric-server 0.36之后:

CentOS 8.0.1905(内核4.18.0-80.11.2。)
Kubernetes 1.16.2
Canal CNI插件
Docker 19.03.4

我发现默认安装的metrics-server(0.3.6)无法操作。由于相关日志中的此错误,我在这里找到了自己的出路:

Error: Get https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.96.0.1:443: i/o timeout

可悲的是,仅eskp的发现不足以满足我的需求。有了进一步的发现,eskp的设置加上一个关键的额外设置使我的安装正常进行。我对Kubernetes还是很陌生,所以我为那些偶然发现此问题的人阐明了这一点,希望它可以节省一些时间:

  1. kubectl编辑部署-n kube-system指标服务器

在spec:spec:containers下添加以下四行。**我将其放在“ image” k8s.gcr.io/metris-server-amd64:v0.3.6“行之前,如下所示:

  - args:
    - --kubelet-insecure-tls
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --metric-resolution=30s
  image: k8s.gcr.io/metrics-server-amd64:v0.3.6 # This line is included for reference and was already present

指标分辨率= 30岁行_不是_必需的。它将刷新率从默认的60秒更改为30秒。

  1. 在先前容器级别之外的spec:spec级别添加以下行:****
      hostNetwork: true
      restartPolicy: Always # This line is included for reference and was already present
  1. 保存更改,然后稍等一下以收集指标。现在,kubectl顶部节点kubectl顶部吊舱现在应该显示有用的信息。

添加hostNetwork:确实是让指标服务器为我工作的最终结果。没有它,娜达。没有kubelet-preferred-address-types行,我可以查询我的主节点,但不能查询我的两个工作节点,也不能查询pod,这显然是不希望的结果**。.kubelet-insecure-tls的缺乏**也导致度量服务器无法运行安装。

thank you verymach

I created an environment with kubeadm on Vagrant.
The reason we had to use "hostNetwork: true" was because the ip of the node was included in the CIDR of the pod network.

kubeadm init --apiserver-advertise-address="192.168.50.10" --apiserver-cert-extra-sans="192.168.50.10"  --node-name k8s-master --pod-network-cidr=192.168.0.0/16

After changing as follows, "hostNetwork: true" is no longer needed

kubeadm init --apiserver-advertise-address="172.16.50.10" --apiserver-cert-extra-sans="172.16.50.10"  --node-name k8s-master --pod-network-cidr=192.168.0.0/16

Closing per Kubernetes issue triage policy

GitHub is not the right place for support requests.
If you're looking for help, check Stack Overflow and the troubleshooting guide.
You can also post your question on the Kubernetes Slack or the Discuss Kubernetes forum.
If the matter is security related, please disclose it privately via https://kubernetes.io/security/.