antonputra/tutorials

tutorials/lesson 072 - kube-prometheus-stack hostNetwork

Anticom opened this issue · 2 comments

First of all thank you for the great tutorial.

Inspecting the default values I've found this comment:

$ helm show values prometheus-community/kube-prometheus-stack | fgrep -A3 "AWS EKS"
  # Required for use in managed kubernetes clusters (such as AWS EKS) with custom CNI (such as calico),
  # because control-plane managed by AWS cannot communicate with pods' IP CIDR and admission webhooks are not working
  ##
  hostNetwork: false

In a discussion about this topic on k8s slack I got this response:

[...] he issue with EKS and custom CNI is that the pod serving the webhook operation effectively is in a different network than control plane. Control plane is operating within network served by VPC. Pod however, instead of being in VPC network(as it would be with default CNI), is being served in your custom plugin netwotk(which has it's own addresses and routing). Control plane does not know how to route a request to webhook container.
As for what will not work if this is not enabled, this will be validating and mutatnig webhooks. You can find particular pods handling those, by describing the webhook and the svc it points to. Without hostNetwork those will not get requests from control plane
~ ref

However I still don't quite understand the implications of the above mentioned facts.

  1. Is this irrelevant for your scenario since you've disabled monitoring of the managed components (such as etcd etc.)?
  2. If it is relevant, can you please do a short follow-up video elaborating on this please?

I'm afraid it's been a while since I made that tutorial. I'm not going to be able to help you on that one.

Just in case someone stumbles upon this issue:

I've found out that this is related to issues with the admission webhooks. Those are responsible for validating the prometheus config before applying it to the prometheus instance. This is intended to make sure no invalid configuration is ever applied to the prometheus instance which would cause to bring it down.

On EKS clusters there seems to be some restrictions since the control plane is manged for you. Those can be overcome by setting the hostNetwork value in the helm chart (with some additional configuration required iirc).

Alternatively the admission webhooks can simply be disabled at the risk of injecting invalid prometheus config and hence potentially crashing your prometheus instance.