voyagermesh/voyager

Voyager sent down the entire app deployed in the cluster

Opened this issue · 4 comments

This is a very interesting story and perhaps one of the very nasty bug I stumbled upon.

We run k8s 1.15.6 and I have deployed Voyager 9.0.0 with the following config values.yaml in kube-system.

    ##
    ## Voyager chart configuration
    ##
    replicaCount: 3

    cloudProvider: aws
    ## Log level for voyager
    logLevel: 5

    # this flag can be set to 'voyager' to handle only ingress
    # with annotation kubernetes.io/ingress.class=voyager.
    ingressClass: "voyager"

    apiserver:
      # enableValidatingWebhook is used to configure apiserver as ValidationWebhook for Voyager CRDs
      enableValidatingWebhook: false
      # If true, bypasses validating webhook xray checks
      bypassValidatingWebhookXray: true

    # Send usage events to Google Analytics
    enableAnalytics: false

We deploy a Voyager Ingress in the default namespace along with other Ingress and Service(s).

Funny bug. Once we kill a voyager pod running the kube-system all the Services in the default namespace are deleted. Result. App down 🙀

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "RequestResponse",
  "auditID": "02dfaf3b-be30-4382-9a6e-fe4612ea052b",
  "stage": "ResponseComplete",
  "requestURI": "/api/v1/namespaces/default/services/<our-app-service>",
  "verb": "delete",
  "user": {
    "username": "system:serviceaccount:kube-system:voyager-haproxy",
    "uid": "40fa5988-3458-11e9-b48a-0a216f8733b0",
    "groups": [
      "system:serviceaccounts",
      "system:serviceaccounts:kube-system",
      "system:authenticated"
    ]
  },
  "sourceIPs": [
    "172.20.53.155"
  ],
  "userAgent": "voyager/v0.0.0 (linux/amd64) kubernetes/$Format",
  "objectRef": {
    "resource": "services",
    "namespace": "default",
    "name": "<our-app-service>",
    "apiVersion": "v1"
  },
  "responseStatus": {
    "metadata": {},
    "status": "Success",
    "code": 200
  },
  "requestObject": {
    "kind": "DeleteOptions",
    "apiVersion": "v1"
  },
  "responseObject": {
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "details": {
      "name": "<our-app-service>",
      "kind": "services",
      "uid": "a45a498b-e118-11e9-8e55-1254a4182eba"
    }
  },

I have just experienced and reproduced a very similar scenario to what is described above. This was using the Voyager v12.0.0 Helm chart.

A test environment I have has several namespaces amongst which the 'integration-product' namespace contains multiple Deployments, DaemonSets, Services, StatefulSets and other K8S objects existing happily.

When I now "successfully" (from Helm's perspective) install a helm chart which includes Voyager as a subchart in another namespace (named 'integration-product-xxx') I can see that all Services and Deployments from unrelated namespace 'integration-product' are being deleted!

Note that:

  • objects in other different namespaces are unaffected and not deleted (maybe it has to do with the affected namespaces name being a substring of the new namespaces name but just guessing?)
  • only Deployments and Services are deleted from what I can see, everything else remains alive (DaemonSets, StatefulSets, ...)

Different to what the OP described, the Helm Chart installation (Helm v3.2.4 using --atomic option) including the Voyager subchart is actually successful so no Voyager Cleanup should be triggered I think. Nevertheless Voayger wreaks havoc on totally unrelated Services and Deployments in a different namespace. Luckily this happened in a test environment.

However, setting 'restrictToOperatorNamespace: true' seems to prevent this massacre. Maybe it should be the default setting?

Still it doesn't feel right that Voyager can delete unrelated objects in unrelated namespaces by default, maybe you can do something about that before it causes real pain for someone in production?

Happy to read that someone else stumbled upon this and there is a huge bug in the operator.

I have the same issue, but I have it only when I add some customization of haproxy in annotations to enable proxy forward and othe things related to TLS.
Without them it works fine, but with those annotations it stop working and nmap results in filtered instead of open or closed.

No error on pod's logs or in describe ingress, nothing usefull, just stopped working.