Should vulnerabilities be de-duped from multiple repositories?
paulaldridge opened this issue · 3 comments
We've found that for Redhat images a vuln report will show duplicates of all vulnerabilities, due to matching against multiple repositories. Unsure if this is intended behaviour and useful to show both repositories, but it seems negative to bloat the vuln report.
For example, using a test image for ubi8.8 (i.e. FROM registry.access.redhat.com/ubi8:8.8-1067.1698056881
), the index report shows multiple repositories:
"repository": {
"3": {
"id": "3",
"name": "cpe:/o:redhat:rhel:8.3::baseos",
"key": "rhel-cpe-repository",
"cpe": "cpe:2.3:o:redhat:rhel:8.3:*:baseos:*:*:*:*:*"
},
"4": {
"id": "4",
"name": "cpe:/a:redhat:enterprise_linux:8::appstream",
"key": "rhel-cpe-repository",
"cpe": "cpe:2.3:a:redhat:enterprise_linux:8:*:appstream:*:*:*:*:*"
},
"5": {
"id": "5",
"name": "cpe:/o:redhat:enterprise_linux:8::baseos",
"key": "rhel-cpe-repository",
"cpe": "cpe:2.3:o:redhat:enterprise_linux:8:*:baseos:*:*:*:*:*"
},
"598": {
"id": "598",
"name": "cpe:/a:redhat:rhel:8.3::appstream",
"key": "rhel-cpe-repository",
"cpe": "cpe:2.3:a:redhat:rhel:8.3:*:appstream:*:*:*:*:*"
},
"6829": {
"id": "6829",
"name": "Red Hat Container Catalog",
"uri": "https://catalog.redhat.com/software/containers/explore"
}
},
And the vuln report contains duplicate vulnerabilities with the only difference being the repository, e.g.
Full vuln report: ubi8.8VulnReport.json
For reference we are using:
github.com/quay/clair/config v1.3.0
github.com/quay/clair/v4 v4.7.2
github.com/quay/claircore v1.5.19
This is just something that falls out of the logic of the data that Red Hat's build system and vulnerability information provide. There's no way to know which is the correct repository, so claircore's rhel
indexing logic is forced to use a cross-product in some situations. See PROJQUAY-5185 and PROJQUAY-5214 for discussion on why.
Ah I see, thanks for explaining. What do you think to de-duping the output when we know it's the same and having the repositories listed together, under 1 vulnerability? Not sure if it would be too heavy weight to check that the outputs are the same before being sure to combine though (unless we know they always will be in these cases).
Yeah, that would be a way to "clean up" the presentation. I don't think it's feasible to do in claircore with the current architecture that doesn't have a real reference/identity mechanism for vulnerabilities.