Correlate and dedupe SBOM overlap between Snyk and GitHub Providers
DerekStrickland opened this issue · 1 comments
DerekStrickland commented
Target Audience
- Harbor system
- SDL consumers
- Harbor engineers
What’s the Value
- Ensures that we don't over report vulnerabilities
- Ensures that we don't store duplicate SBOMs with different names/ids that actually represent the same entity.
Details
As consumers of the data Harbor produces, we would like all distinct SBOM targets and related data to be:
- Accurately correlated across data sources.
- Uniquely identifiable and deduplicated.
Use Case
- The ab2d repository exists in the CMSGov GitHub Organization.
- It also exists in Snyk.
- Both providers will detect the repo and attempt to ingest an SBOM for it using their specific methodology.
- To date, we believe that there is not an obvious shared unique attribute that will allow us to explicitly correlate the 2.
- We need a way to resolve that these 2 incoming data streams relate to the same SBOM target, and not create duplicate entries in the Package, Sbom, and Vulnerability collections.
- Similarly, we should create S3 output only for a single resolved SBOM target.
Definition of Done
- If an ingestion target is identified by both the Snyk and GitHub ingestion tasks (and any future tasks), unique Package, Sbom, and Vulnerability collection entries should be created.
- The Primary Package and related Sbom entries should have a Xref to both the Snyk Project and the Harbor BuildTarget.
- From the SDL perspective, all combinations of Primary Package.purl to Dependency.purl are unique.
sbolel commented