kubernetes-sigs/metrics-server

403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats)

benmathews opened this issue · 20 comments

Trying to run metrics-server on a 1.10.3 cluster. With the stock yaml, I'm getting this in the logs.

E0805 16:38:05.120665 1 summary.go:97] error while getting metrics summary from Kubelet xxxxxxxxxxxxxxxx(xxxxxxxxxxx:10255): Get http://xxxxxxxxxx:10255/stats/summary/: dial tcp xxxxxxxxxx:10255: getsockopt: connection refused
I found references to this error elsewhere and modified the deployment args

From: - --source=kubernetes.summary_api:''
To : --source=kubernetes.summary_api:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true

That results in errors like this

E0806 19:56:05.269962 1 summary.go:97] error while getting metrics summary from Kubelet xxxxxxxxxx(xxxxxxx:10250): request failed - "403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats)"

How do I get the stats-server working?

make sure you've got webhook auth enabled on you kubelet.

I do have it enabled.

/usr/bin/kubelet --allow-privileged=true --authentication-token-webhook=true --authorization-mode=Webhook --cadvisor-port=0 --client-ca-file=/etc/kubernetes/pki/ca.pem --cluster-dns=172.31.252.2 --cluster-domain=cluster.local --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --container-runtime=docker --docker=unix:///var/run/docker.sock --event-qps=0 --hostname-override=se-k8s-node37.vivintsky.com --kubeconfig=/etc/kubernetes/kubelet.conf --make-iptables-util-chains=true --network-plugin=cni --node-ip=172.31.32.37 --pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 --pod-manifest-path=/etc/kubernetes/manifests --read-only-port=0 --register-schedulable=true --serialize-image-pulls=false --streaming-connection-idle-timeout=0 --tls-cert-file=/etc/kubernetes/pki/kubelet.pem --tls-private-key-file=/etc/kubernetes/pki/kubelet-key.pem --v=2 --volume-plugin-dir=/usr/libexec/kubernetes/kubelet-plugins/volume/exec/ Restart=on-failure

Source is currently set to

- --source=kubernetes.summary_api:https://kubernetes.default.svc?kubeletHttps=true&kubeletPort=10250&useServiceAccount=true

hmm... that looks correct. Is metrics server running without a service account or something?

Thanks for this thread.

# command list of metrics-server deployment
--source=kubernetes.summary_api:https://kubernetes.default.svc?kubeletHttps=true&kubeletPort=10250&useServiceAccount=true&insecure=true

This connection string works for my v1.11 kubeadm cluster.
( 10255-kubelet-insecure-port is disabled by default + webhook auth is enabled by default )

Sadly, the insecure flag is necessary, because the kubelet certs do not have the node IP's in the SANs. kubernetes/kubernetes#59372 (comment)

I verified by checking that kubectl top nodes returns cluster stats after a few minutes.
The metric-server Pod logs should be pretty clean.

I'm pretty sure this is exactly what Prometheus Operator and kube-prometheus configure for HTTPS kubelet auth as well.

There are three paths for kubelet serving certs:

  1. Created out of band, explicitly provided to the kubelet. What these certs is up to the deployer.
  2. Self-signed by the kubelet on start. Since these are self-signed, you'd have to disable tls verification no matter what SANs the cert contained.
  3. requested by the kubelet from the CSR API (alpha in 1.11, beta in 1.12, must be paired with an approval process for the CSR requests). The cert is requested with all the addresses the kubelet would report in its Node status.addresses field. As long as one of the address in the Node status is used to contact the node, the certificate should have a matching SAN.

Can not make it work on 1.10.4 neither with the same error :-(

Same for me - 403.

The service account credentials are mounted:

tmpfs                     1.9G     12.0K      1.9G   0% /var/run/secrets/kubernetes.io/serviceaccount

But it does not seem to use the token provided by the serviceAccount.
I tried mounting a ConfigMap into the container that contains the kubeconfig with the paths to the Service Account Credentials and token.
Sadly, that did not change anything. Still facing the issue of a unauthorized, anonymous access.

It works when removing the flag --authorization-mode=webhook, but that just means that the kubelet is running in AllowAll mode, which is not really secure.

Cluster is on 1.10.3.

@tommyknows which version of metrics-server are you running?

v.0.3.0 - like it's in the deployment yaml.
I can confirm that it worked with v0.2.1, although I had used the --source=kubernetes.summary_api:'' Flag (and never really understood what it did, but it fixed an issue I had)

also, are you starting your kubelet with --authentication-token-webhook?

I haven't!
Works now, thank you.

kubelet with --authentication-token-webhook is deprecated.But this was writen in the config file,the same issue is exist.

[root@node ~]# cat /var/lib/kubelet/config.yaml
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
E1011 07:38:42.320912       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:master: unable to fetch metrics from Kubelet master (192.168.8.184): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)", unable to fully scrape metrics from source kubelet_summary:node: unable to fetch metrics from Kubelet node (192.168.8.185): request failed - "403 Forbidden", response: "Forbidden (user=system:serviceaccount:kube-system:metrics-server, verb=get, resource=nodes, subresource=stats)"]

serviceaccount has been mounted

[root@master metrics-server]# kubectl describe pods/metrics-server-v0.3.0-5d4d6b8599-zsvv8 -n kube-system
Name:               metrics-server-v0.3.0-5d4d6b8599-zsvv8
Namespace:          kube-system
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               node/192.168.8.185
Start Time:         Wed, 10 Oct 2018 18:39:30 +0800
Labels:             k8s-app=metrics-server
                    pod-template-hash=5d4d6b8599
                    version=v0.3.0
Annotations:        cni.projectcalico.org/podIP: 172.20.1.35/32
                    scheduler.alpha.kubernetes.io/critical-pod: 
                    seccomp.security.alpha.kubernetes.io/pod: docker/default
Status:             Running
IP:                 172.20.1.35
Controlled By:      ReplicaSet/metrics-server-v0.3.0-5d4d6b8599
Containers:
  metrics-server:
    Container ID:  docker://fa55e7f7343a700954dfe344f17fb0a587e9bdf0915af66117169ad06e31e41f
    Image:         k8s.gcr.io/metrics-server-amd64:v0.3.0
    Image ID:      docker://sha256:277be148865e31897310bd4f0fab8e4ea3e9fc1d22b6614ef99efca98e65df70
    Port:          443/TCP
    Host Port:     0/TCP
    Command:
      /metrics-server
      --metric-resolution=30s
      --kubelet-insecure-tls
      --kubelet-preferred-address-types=InternalIP
    State:          Running
      Started:      Wed, 10 Oct 2018 18:39:32 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     52m
      memory:  500Mi
    Requests:
      cpu:        52m
      memory:     500Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-qdwk2 (ro)
  metrics-server-nanny:
    Container ID:  docker://10989cd9327dd6566c3752cbbbae1464c05706f006e119a0a1a4d39091846827
    Image:         k8s.gcr.io/addon-resizer:1.8.3
    Image ID:      docker://sha256:35df0c9fbd2a069409e144ed6144bfb949d894f0f2c560d78642825dac01e7bb
    Port:          <none>
    Host Port:     <none>
    Command:
      /pod_nanny
      --config-dir=/etc/config
      --cpu=50m
      --extra-cpu=0.5m
      --memory=200Mi
      --extra-memory=100Mi
      --threshold=5
      --deployment=metrics-server-v0.3.0
      --container=metrics-server
      --poll-period=300000
      --estimator=exponential
      --minClusterSize=3
    State:          Running
      Started:      Wed, 10 Oct 2018 18:39:32 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  300Mi
    Requests:
      cpu:     5m
      memory:  50Mi
    Environment:
      MY_POD_NAME:       metrics-server-v0.3.0-5d4d6b8599-zsvv8 (v1:metadata.name)
      MY_POD_NAMESPACE:  kube-system (v1:metadata.namespace)
    Mounts:
      /etc/config from metrics-server-config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-qdwk2 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  metrics-server-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      metrics-server-config
    Optional:  false
  metrics-server-token-qdwk2:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  metrics-server-token-qdwk2
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

ClusterRole and binding

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:metrics-server
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
  - deployments
  verbs:
  - get
  - list
  - update
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:metrics-server
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
[root@master metrics-server]# kubectl describe clusterrole system:metrics-server
Name:         system:metrics-server
Labels:       addonmanager.kubernetes.io/mode=Reconcile
              kubernetes.io/cluster-service=true
Annotations:  <none>
PolicyRule:
  Resources               Non-Resource URLs  Resource Names  Verbs
  ---------               -----------------  --------------  -----
  deployments.extensions  []                 []              [get list update watch]
  namespaces              []                 []              [get list watch]
  nodes                   []                 []              [get list watch]
  pods                    []                 []              [get list watch]

metric-server version is 0.3.0 and don't support --source.

In metric-server 0.3.1 perhaps solve the problem

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - nodes/stats
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
  - deployments
  verbs:
  - get
  - list
  - watch

The problem for me at least was that we had a network issue where the masters couldn't properly make api calls that resolved to the worker nodes. Fixing that and metrics server works.

@benmathews I have same issue and only some of my nodes show me 403 forbidden...(yes, others works well...) So could you please share your solution for this? Thanks.

@aisensiy as I said, it was a network issue that the routes weren't properly set up. The problem and solution were custom to my environment.

Ok...My situation is also quite special. I use direct connect service in aws which connect my own network with one aws vpc and it also look like network issue for the 403 forbidden response.

k8s.docx
大家帮忙看看我这个是啥问题?

@benmathews I have same issue and only some of my nodes show me 403 forbidden...(yes, others works well...) So could you please share your solution for this? Thanks.

how solve it,thanks

第一种解决方法:
在启动kubelet的时候修改 --authorization-mode=Webhook 为--authorization-mode=AlwaysAllow,可以避免403,但不安全;
第二种解决方法:
使用kubeconfig,在启动metrics-server容器的时候添加如下命令(在YAML中添加):
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubeconfig=/key/kubeconfig

--kubeconfig=/key/kubeconfig 使用指定的kubeconfig,确保容器内部/key/kubeconfig 里面为kubeconfig内容,可以采用挂在卷的方式