No cpuUsage and MeMusage metric popping up

Question

No cpuUsage and MeMusage metric popping up

agupta-ionos opened this issue 5 years ago · 19 comments

Hi Team,

I have installed Kubernetes Opex Analytics using Helm. But I am unable to see the metrics on the dashboard.

Can anyone would like to help?

Error:
The following items failed:

No cpuUsage metric on node X.X.X.X

agupta-ionos commented 5 years ago

error.txt

Answer 1 · 2019-07-25T17:39:24.000Z

Hi,
Can you check the pod logs please?

Answer 2 · 2019-07-26T06:53:03.000Z

ubuntu@ecs-test:~/kube-opex-analytics$ kubectl describe po -n kube-opex-analytics
Name: kube-opex-analytics-845675bc58-6sm5j
Namespace: kube-opex-analytics
Priority: 0
Node: 192.168.0.214/192.168.0.214
Start Time: Fri, 26 Jul 2019 06:51:03 +0000
Labels: app.kubernetes.io/instance=kube-opex-analytics
app.kubernetes.io/name=kube-opex-analytics
pod-template-hash=4012316714
Annotations: kubernetes.io/availablezone: eu-de-01
Status: Running
IP: 172.16.0.14
Controlled By: ReplicaSet/kube-opex-analytics-845675bc58
Containers:
kube-opex-analytics:
Container ID: docker://f0b51b2e5f3e4f7bd189321a9688bb4fe84864a8836f72131b45c8d1e1884428
Image: rchakode/kube-opex-analytics:0.3.1
Image ID: docker-pullable://rchakode/kube-opex-analytics@sha256:3aad7d7c116da0f879a7099ef93e332d53ce16ecc230712cad04cf96403e1d18
Port: 5483/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 26 Jul 2019 06:51:07 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
KOA_BILLING_CURRENCY_SYMBOL: $
KOA_BILLING_HOURLY_RATE: 0
KOA_COST_MODEL: CUMULATIVE_RATIO
KOA_DB_LOCATION: /data/db
KOA_K8S_API_ENDPOINT: https://kubernetes.default
KOA_K8S_API_VERIFY_SSL: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-opex-analytics-token-s9dd5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-opex-analytics-token-s9dd5:
Type: Secret (a volume populated by a Secret)
SecretName: kube-opex-analytics-token-s9dd5
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message

Normal Scheduled 69s default-scheduler Successfully assigned kube-opex-analytics/kube-opex-analytics-845675bc58-6sm5j to 192.168.0.214
Normal Pulling 67s kubelet, 192.168.0.214 pulling image "rchakode/kube-opex-analytics:0.3.1"
Normal Pulled 66s kubelet, 192.168.0.214 Successfully pulled image "rchakode/kube-opex-analytics:0.3.1"
Normal SuccessfulMountVolume 65s (x2 over 69s) kubelet, 192.168.0.214 Successfully mounted volumes for pod "kube-opex-analytics-845675bc58-6sm5j_kube-opex-analytics(bc3da2dc-af71-11e9-96d3-fa163efd36b7)"
Normal SuccessfulCreate 65s kubelet, 192.168.0.214 Created container
Normal Started 65s kubelet, 192.168.0.214 Started container
Normal Healthy 60s (x2 over 62s) kubelet, 192.168.0.214 container docker://f0b51b2e5f3e4f7bd189321a9688bb4fe84864a8836f72131b45c8d1e1884428 in health status

Answer 3 · 2019-07-26T06:54:41.000Z

agupta-ionos commented 5 years ago

Answer 4 · 2019-07-26T07:26:45.000Z

hi @anubhav25gupta,

To get pod logs, use kubectl logs instead of kubectl describe.

Answer 5 · 2019-07-26T07:34:18.000Z

I dont know which data just got populated but still showing error in the header a : No CPUUsage metric on node X.X.X.X

Answer 6 · 2019-07-26T07:39:22.000Z

What is your version of Kubernetes?
You seem to have a RBAC issue to access metrics API:

2019-07-26 07:26:10,490 - kube-opex-analytics - ERROR - call to https://kubernetes.default/apis/metrics.k8s.io/v1beta1/nodes returned error ({"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"nodes.metrics.k8s.io is forbidden: User \"system:anonymous\" cannot list resource \"nodes\" in API group \"metrics.k8s.io\" at the cluster scope","reason":"Forbidden","details":{"group":"metrics.k8s.io","kind":"nodes"},"code":403}
)

Answer 7 · 2019-07-26T07:50:17.000Z

The kubernetes version is : v1.11.3-r1.sp2

Answer 8 · 2019-07-26T07:53:40.000Z

Its CCE 2.0

Answer 9 · 2019-07-26T07:56:10.000Z

Do you use microk8s?
Anyway check that the metrics API is working, and if it's the case review your clusterrole and clusterbindingrole to check that kube-opex-analytics can access the metrics API

Answer 10 · 2019-07-26T08:11:15.000Z

We haven't tested kube-opex-analytics with CCE, I suspect a specificity of RBAC handling on CCE. Or perhaps the metrics API is not enabled by default.

Please investigate in this direction.

Answer 11 · 2019-07-26T08:14:41.000Z

I dont use Microk8s.

i have checked clusterrole and clusterbindings and everthing is looking normal to me.

Kindly advice me so as how to enable the metrics API by default.

Answer 12 · 2019-07-26T09:41:33.000Z

Hi,

Please check the below note about the metrics server.

I have set the option prometheusOperator as enabled(true )during the deployment inHelm values.yaml file.

I am not able to manually run prometheus.yaml file. Getting the error as below:

ubuntu@ecs-test:~/kube-opex-analytics/tests/prometheus$ kubectl apply -f prometheus.yml
error: error validating "prometheus.yml": error validating data: [apiVersion not set, kind not set]; if you choose to ignore these errors, turn validation off with --validate=false

prometheus.yaml file :

global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s

Attach these labels to any time series or alerts when communicating with

external systems (federation, remote storage, Alertmanager).

external_labels:
monitor: 'codelab-monitor'

rule_files:

- "first.rules"

- "second.rules"

A scrape configuration containing exactly one endpoint to scrape:

Here it's Prometheus itself.

scrape_configs:

The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

job_name: 'prometheus'

Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 5s

static_configs:
- targets: ['80.158.3.8:9090']
job_name: 'kube-opex-analytics'

Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 300s

static_configs:
- targets: ['80.158.3.8:5483']

Answer 13 · 2019-07-26T10:00:49.000Z

The problem is not about Prometheus metrics, but Kubernetes metrics API.
And kube-opex-analytics is not able (likely due to RBAC permissions) to access the API to retrieve metrics.

Answer 14 · 2019-07-26T10:03:00.000Z

ok doest it mean that only RBAC i.e. related to values.yaml file has to be changed?

Answer 15 · 2019-07-26T11:10:26.000Z

Following up the error found in the pod' logs, the following link may help you to fix the issue with the access to your Kubernetes metrics API: kubernetes-sigs/metrics-server#95

It's a similar error.

Answer 16 · 2019-07-26T11:53:38.000Z

Hi,

Where I need to paste this:

use kubeconfig, start metrics-server Add the following command to the container (added in YAML):
command:

/metrics-server
--kubelet-insecure-tls
--kubeconfig=/key/kubeconfig

--kubeconfig=/key/kubeconfig Use the specified kubeconfig to ensure that the inside of the container /key/kubeconfig is the kubeconfig content, you can use the way to hang on the volume

I am not getting what is the exact deployment file name?

Answer 17 · 2019-07-26T17:27:19.000Z

Hi all,
@anubhav25gupta these settings should be done on your Kubernetes installation.
In your case with CCE it means may be to ask to your cloud provider

Answer 18 · 2019-09-28T21:34:44.000Z

we don't have access to a CCE cluster to reproduce and eventually fix this issue.