kubernetes-retired/kube-aws

prometheusMetrics for clusterAutoscaler plugin [0.15]

sei-nicolas opened this issue · 5 comments

Using kube-aws release 0.15, it's not possible to create a cluster where:

  • clusterAutoscaler plugin is enabled
  • prometheusMetrics option of the plugin is enabled

Using kube-aws apply will hang while creating/initializing the controller instances with the following message: "Failed to set up mount unit: Invalid argument"

Internally (syslog of the instance being created), the error is the following:

  • cmdLine: "/bin/bash /opt/bin/retry 10 /opt/bin/install-kube-system"
  • message: "error parsing /srv/kube-aws/plugins/cluster-autoscaler/servicemonitor.yaml: error converting YAML to JSON: yaml: line 9: mapping values are not allowed in this context"

And indeed, in https://github.com/kubernetes-incubator/kube-aws/blob/master/builtin/files/plugins/cluster-autoscaler/manifests/servicemonitor.yaml line 10, there should be two spaces less at the beginning of the line.

However, fixing the template is not enough, it results in the same message from "kube-aws apply", and the instance syslog that time is:

  • cmdLine: "/bin/bash /opt/bin/retry 10 /opt/bin/install-kube-system"
  • message: "error: unable to recognize /srv/kube-aws/plugins/cluster-autoscaler/servicemonitor.yaml: no matches for kind ServiceMonitor in version monitoring.coreos.com/v1"

I think this is related with prometheus-operator/prometheus-operator#1866 (comment) but the solution here is not clear.

I'm not sure you can create a cluster with this service monitors enabled. As the cluster doesn't have prometheus operator installed the crd is missing and installation will fail if I'm not wrong. Try disabling the metrics, installing the operator and doing the cluster upgrade later

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.