ckotzbauer/vulnerability-operator

Context-based vulnerability filtering

muellerst-hg opened this issue · 18 comments

It might occur, that a CVE is a permanent false positive (e.g. rubygem rexml 3.2.3.1 in gitlab/gitlab-ce:15.5.6-ce.0) or a container is unaffected by a CVE.

In order to focus on the real issues, we'd like our view (Grafana consuming prometheus metric) to filter these CVEs. An idea was to have another label "audit_result" in the prometheus metric vuln_operator_cves.

How do you handle this?
In which ways could vulnerability_operator support audits?

cc/ @nicholasdille

There's a grype-config-file parameter for this operator where you can specify ignore rules. These vulnerabilities are not exposed as metric.
Is this helpful for you?

Thanks for your quick reply and the pointer to grypes ignore rules.

There's a grype-config-file parameter for this operator where you can specify ignore rules. These vulnerabilities are not exposed as metric. Is this helpful for you?

Partly yes, e.g. a rule applied for ruby rexml CVE GHSA-8cr8-4vfw-mr7 + version 3.2.3.1 would effectively filter a false positive.
However, the scopes of these ignore rules do not cover our auditing contexts like k8s namespace, pod or container image. Though these contexts are known to vulnerability_operator

Okay, so you want to ignore vulnerabilities by their

  • CVE
  • Package-Name

when they occur in a certain

  • K8s namespace
  • K8s pod-name (or parent-resource Deployment, DaemonSet, StatefulSet)
  • Container image

For simplicity it would be easier to specify Namespace, Pod and/or image-name exactly. Would this be sufficient for you?

For simplicity it would be easier to specify Namespace, Pod and/or image-name exactly. Would this be sufficient for you?

Yes. Do you already have something in mind?

I think this would end up in a config-file approach, similar to the existing one from grype. Maybe it is possible to integrate some matching-logic from grype as its very powerful.

Do you have a specific implementation approach in mind?

We have discussed the idea of filtering or labeling metrics and have come up with a few approaches. We'd like to share them with you in case this helps :-)

  1. Filter/labeling rules can be applied when metrics are created - around internal/vuln/kubernetes/kubernetes.go#L77. But the big unknown is where to get the rules from. This could be a config file stored in a ConfigMap or even one or more custom resources based on a new CRD.

  2. You already have implemented relabeling fields for the service monitor in the helm chart. We have tested this (example below) and fear that the number of relabeling rules will grow quickly and slow down scraping. Do you have any experience with this?

We would like to support you implementing this feature. How can we go about this?

Example for relabeling We have tried introducing a label called audit_result so we do not loose the metrics but are able to filter them in Grafana:

spec:
  endpoints:
  - path: /metrics
    port: http

    # Add default "unknown" value for label audit_result
    relabelings:
    - action: replace
      replacement: unknown
      targetLabel: audit_result
    metricRelabelings:

    # Set label audit_result to ignore for all CVEs in image
    - action: replace
      regex: gitlab-(qa|live);gitlab/gitlab-ce@sha256:f1e2f874a2c8e715a2e55f78a74f56ea947c3013df37bf826600c38feba9957f
      replacement: ignore
      separator: ;
      sourceLabels:
      - k8s_namespace
      - image_id
      targetLabel: audit_result

    # Set label audit_result to ignore for specific CVEs in an image
    - action: replace
      regex: maas-.+;registry.company.io/foo/myimage@sha256:46850a00c4e3c38a13b4f1467e3556e0e13cc4681cca3af4b66999d3dc6b8484;(GHSA-c28r-hw5m-5gv3|GHSA-jjjh-jjxp-wpff|GHSA-rgv9-q543-rqg4|GHSA-wv7w-rj2x-556x)
      replacement: ignore
      separator: ;
      sourceLabels:
      - k8s_namespace
      - image_id
      - cve
      targetLabel: audit_result

    # Set label audit_result to ignore for all packages of type java-archive in an image
    - action: replace
      regex: sonar-(qa|live);sonarqube@sha256:f41e2ea3ac473188326e2549865f8e746c208c5c25c40930d30697770728f093;java-archive
      replacement: ignore
      separator: ;
      sourceLabels:
      - k8s_namespace
      - image_id
      - type
      targetLabel: audit_result

Thanks @nicholasdille and @muellerst-hg for sharing your thoughts on this.
Just to clarify: You still want to see the "filtered" CVEs in the metrics, so filtering (removal, as it's now with the grype-config) is not sufficient?

There are several levels of filtering/labeling possible:

  • Restrict containers based on their image/pod/namespace while fetching from the cluster to prevent, that they are scanned and processed in general. In a simple case this could be achieved with label-selectors from client-go, but this would only be an "Include" operation, as it's not possible to invert/exclude labels with this selection. For this type of filter, the mentioned code-path kubernetes.go#L77 would be the right place, otherwise it would only be the place to attach more metadata which is used later.
  • Labeling of containers based on their image/pod/namespace after they have been scanned and "redirect" the results to a new metric or something similar.
  • Labeling of CVEs based on their image/pod/namespace/package-name/... where they occur and "redirect" the results to a new metric or something similar. This and the second option would go here or in a new file of the same package.

Place of the config:

  • For the first option (simple label-selector) I would prefer cli-arguments, as for any other config right now.
  • Second and third option need a more extensive config and could be achieved with a config-file. This could be stored as configmap by the helm-chart in the same way as it's now already done for the grype-config (no built-in configmap loading-logic for the operator). CRDs would also be an option, but this increases the efforts for implementation and maintenance. Also the reconciler is a always a bit tricky to implement. So I would prefer the former.

Relabeling with Prometheus is of course an option for some rare edge-cases but it's hard to maintain when there are more rules over time, also the performance will decrease as you mentioned.

So core todos for the implementation would be:

  • Definition of a format for the config-file - draft below (would prefer YAML but deserialization could maybe support multiple types)
  • Implementation of a filter-engine based on the given config.
  • Expose the newly structured results correctly to all targets. For "metrics" it's maybe an option to create a second metric to reduce the size of the data for each metric.

Config draft

ignore:
  - vulnerability: CVE-2022-1234 # mandatory if "package" is not set
    package: openssl # mandatory if "vulnerability" is not set
    # maybe more properties from the grype-config format could be supported
    context: # optional, if no context is given, the CVE is filtered globally
      - image: gitlab/gitlab-ce@sha256:f1e2f874a2c8e715a2e55f78a74f56ea947c3013df37bf826600c38feba9957f
        namespace: gitlab-prod
        name: gitlab-server-*
        kind: Pod
      - image: sonarqube@sha256:f41e2ea3ac473188326e2549865f8e746c208c5c25c40930d30697770728f093
        namespace: '*'
        name: sonarqube
        kind: Deployment

All context-properties should support wildcard-matching to simplify the rule definition. I think this should be doable with the image-parsing lib used.
Each context-property is optional, but you have to specify at least one property (name and kind have to be used in conjunction). The context-properties are AND-linked. The context-objects are OR-linked, so the CVE would be ignored as long as one context matches.

What do you think?

Thank you, Christian. I will get back to it after christmas holidays.

It took us a while for reviewing your well-elaborated proposal, which sounds very promising.

Filtering using grype-config is not sufficient for us. We want to see "filtered" CVEs in the metrics.
The approach of labeling of CVEs based on their image/pod/namespace/package-name/... using a config-file would match our needs.

Auditing itself could be a separate process/application and as a result, the config-file would be updated. To re-read the config-file, the operator could watch the file or listen to SIGHUP, because redeploying the operator completely takes quit some time.

We would be happy to review and test your approach.

Thanks @muellerst-hg,
than this

Labeling of CVEs based on their image/pod/namespace/package-name/... where they occur and "redirect" the results to a new metric or something similar. This and the second option would go here or in a new file of the same package.

would be the way to go.

I will start the implementation and post updates here.

  • Config format
  • Implementation of new filter-engine
  • Adjusted targets to handle filtered and audited vulnerabilities
  • Documentation
  • Deprecate grype-config

Thanks for impementing new filtering capabilities.

I did some tests against a gitlab test instance, with the following filter.yaml:

ignore:
  - vulnerability: GHSA-ggxm-pgc9-g7fp # rdoc
    context:
      - image: "*"
        namespace: gitlab-dev
        kind: Deployment
        name: gitlab-dev-server
audit:
  - vulnerability: GHSA-fp4w-jxhp-m23p # bundler
    context:
      - image: "*"
        namespace: gitlab-dev
        kind: Deployment
        name: gitlab-dev-server

In the cluster there are three deployments: "gitlab-dev-server", "gitlab-dev-gitaly", "gitlab-dev-redis", all of them with container specs using the same image "gitlab/gitlab-ce:15.9.0-ce.0".

I would expect the vuln_operator_cves_audit metric to contain exactly 2 rows, one for each CVE matching the deployment "gitlab-dev-server".
Instead, the metric contains 6 rows, matching the CVEs on all three deployments "gitlab-dev-server", "gitlab-dev-gitaly", "gitlab-dev-redis" in the namespace.

Is there a misconfiguration on my side?

Hm, when this image has only these two CVEs, I would expect the following:
Metric vuln_operator_cves contains 4 rows (both CVEs for the gitaly and redis deployment)
Metric vuln_operator_cves_audit contains 1 row (bundler CVE from server deployment).

Why you expect 2 CVEs? You ignore one of them.

Sorry, I put it wrong when writing the comment. I have the same expectations as you have.

However, the results are different here:

Metric vuln_operator_cves contains 0 rows
Metric vuln_operator_cves_audit contains 3 rows (bundler CVE from gitaly, redis and server deployment).

AFAIK MetricTarget.ProcessVulns creates a row for each container in Vulnerability.Containers[] but matching containers are never removed from the array.

Thanks, I'll have a look in the upcoming days.

@muellerst-hg I adjusted the filter-logic to handle vulnerabilities with different filter-contexts correctly. Can you please have a look again with your test-setup?

@muellerst-hg I adjusted the filter-logic to handle vulnerabilities with different filter-contexts correctly. Can you please have a look again with your test-setup?

looks good to me. thanks!

Released in 0.16.0
Thanks for your help, feedback and ideas! 🥳