rchakode/kube-opex-analytics

Issue with CPU capacity reported by virtual nodes on Alicloud K8s

jonwtech opened this issue ยท 3 comments

My cluster is running a combination of regular and virtual nodes. The virtual nodes report a cpu capacity of '192k' which is 192000000 according to the K9s console.

spec:
  capacity:
    cpu: 192k
    ephemeral-storage: 60000Gi
    hugepages-2Mi: 60Ti
    memory: 640Ti
    nvidia.com/gpu: 1k
    pods: 3k

This causes an error with kube-opex-analytics on startup:

ERROR:kube-opex-analytics:ValueError Exception in create_metrics_puller => Traceback (most recent call last):
  File "./backend.py", line 756, in create_metrics_puller
    k8s_usage.extract_nodes(pull_k8s('/api/v1/nodes'))
  File "./backend.py", line 371, in extract_nodes
    node.cpuCapacity = self.decode_cpu_capacity(status['capacity']['cpu'])
  File "./backend.py", line 337, in decode_cpu_capacity
    return int(cap_input)
ValueError: invalid literal for int() with base 10: '192k'

I can see a couple of similar closed issues relating to mem and cpu units of measurement, would it be possible to add this use-case?

Hi @jonwtech
Thanks to report this, it's unusual to have k as unit for CPU (it's a very large server) :)
That said, I don't know the unit used by k9s but k should be a 1000 factor (i.e. 192k should be 192000, and not 192000000 as suggested by k9s). See Kubernetes documentation => https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/.

I've made a change available in the following Docker image docker pull rchakode/kube-opex-analytics:2022-03-28-9d3b261.
=> According to the deployment method used (helm vs kustomize), you may just have to update the version in Chart.yaml or in kustimization.yaml (tag: 2022-03-28-9d3b261).
Can you give it a try and let me know if that fix the issue? I'll merge the change once valided.

Hi @rchakode - yes I thought k9s had got it wrong, thanks for confirming! :)

I've tested and can confirm this fixes the issue - many thanks for the swift response.

Nice ๐Ÿ‘๐Ÿพ
Thanks for having reported this issue.
Changes merged in a hotfix release just published: v22.02.1