CMS-Enterprise/sbom-harbor

Correlate and dedupe SBOM overlap between Snyk and GitHub Providers

DerekStrickland opened this issue · 1 comments

Target Audience

  • Harbor system
  • SDL consumers
  • Harbor engineers

What’s the Value

  • Ensures that we don't over report vulnerabilities
  • Ensures that we don't store duplicate SBOMs with different names/ids that actually represent the same entity.

Details

As consumers of the data Harbor produces, we would like all distinct SBOM targets and related data to be:

  • Accurately correlated across data sources.
  • Uniquely identifiable and deduplicated.

Use Case

  • The ab2d repository exists in the CMSGov GitHub Organization.
  • It also exists in Snyk.
  • Both providers will detect the repo and attempt to ingest an SBOM for it using their specific methodology.
  • To date, we believe that there is not an obvious shared unique attribute that will allow us to explicitly correlate the 2.
  • We need a way to resolve that these 2 incoming data streams relate to the same SBOM target, and not create duplicate entries in the Package, Sbom, and Vulnerability collections.
  • Similarly, we should create S3 output only for a single resolved SBOM target.

Definition of Done

  • If an ingestion target is identified by both the Snyk and GitHub ingestion tasks (and any future tasks), unique Package, Sbom, and Vulnerability collection entries should be created.
  • The Primary Package and related Sbom entries should have a Xref to both the Snyk Project and the Harbor BuildTarget.
  • From the SDL perspective, all combinations of Primary Package.purl to Dependency.purl are unique.
sbolel commented

⚠️ Repository Decommission Notice: This repository is scheduled to be archived as it has been decommissioned and will no longer be actively maintained. As part of the archival process, we are closing all open issues and pull requests.