Feature: Don't Error Log For Dropped or Unmatched Objects

Question

Feature: Don't Error Log For Dropped or Unmatched Objects

aauren opened this issue 3 years ago · 3 comments

In our environments we don't wish to give kubernetes-event-exporter RBAC access to some objects for various reasons. Additionally, we have many development environments where developers add CRDs dynamically and we may not have had a chance to give RBAC permissions to kubernetes-event-exporter so that it can get / list them.

When an event happens on an object that kubernetes-event-exporter has not yet been given permissions to, it emits an error because it cannot gather metadata (e.g. labels / annotations) on the object that the event was thrown on: https://github.com/opsgenie/kubernetes-event-exporter/blob/master/pkg/kube/watcher.go#L70-L85

These errors look like:

{
  "level": "error",
  "error": "ingresses.networking.k8s.io \"foo\" is forbidden: User \"system:serviceaccount:kube-system:kubernetes-event-exporter\" cannot get resource \"ingresses\" in API group \"n
etworking.k8s.io\" in the namespace \"bar\"",
  "time": "2021-11-05T12:16:49Z",
  "caller": "/root/go/src/github.com/opsgenie/kubernetes-event-exporter/pkg/kube/watcher.go:81",
  "message": "Cannot list annotations of the object"
}

Over time, we get a considerable amount of log spam from this. It would be helpful if kubernetes-event-exporter would do a basic sanity check on whether or not it would match an item given the information it already has (name, namespace, event type, kind, etc.) before it tried to lookup the item. If it would not match the item, processing stops early so that it saves needless event processing time and doesn't emit these types of errors.

It would allow users to limit their matches with configurations like:

logLevel: error
logFormat: json
route:
  routes:
    - match:
      - apiversion: "v1|apps/v1|batch/v1"
        receiver: "dump"
receivers:
  - name: "dump"
    file:
      path: "/dev/stdout"

Obviously, it would not be able to filter out matches that would be based on labels or annotations at this level because this logic would be executed directly before the logic to enhance the event with label / annotation metadata.

If the project would be amenable to this change, I have a patch that we've been running with for a bit that abstracts and reuses the same checking logic that is found in process event: https://github.com/opsgenie/kubernetes-event-exporter/blob/master/pkg/exporter/route.go#L14 to check the event against the information that it has already to see if the event should be processed before trying to fetch the event and possibly encountering an RBAC permission error.

Let me know if you're interested and I'll open a PR for our patch so you can review.

Answer 1 · 2021-11-06T09:46:42.000Z

This makes sense. Current cache only works per-object basis but we can change it to apiversion/kind duo.

Answer 2 · 2021-11-09T21:24:39.000Z

@mustafaakin-atl - Thanks for your quick reply!

I wanted to let you know that I'm still working on a better patch for this. I realized that there was a problem with my previous plan of using the OnEvent logic to determine if an event would match the rules prior to labels and annotations being fetched.

As I was about to submit a PR for this functionality, I realized that it would break for anyone that had label or annotation based rules as re-using the match logic in OnEvent would cause it to try to match labels and annotations against a nil set and it would potentially fail an event that would have been successful if we had fetched the label and annotation data. Splitting the logic a part, I think is too big of a change for what I'm trying to accomplish.

I'm currently using your comment to look into caching API responses and then only erroring on the first time an error of a given type is presented. That way people still understand that it isn't collecting these events, but don't get quite as much error spam out of that process.

Answer 3 · 2022-02-10T15:23:37.000Z

Just a quick update here, haven't forgotten about this one, just trying to find a time to do it properly.

I did a patch that is working internally, but it is very much geared for our use-case and isn't likely a good fit for the project as a whole. I'm still interested in trying to find a better approach as soon as I have time.