paketo-buildpacks/packit

Support varying schema versions of SBOM outputs

ryanmoran opened this issue · 3 comments

Problem

Relying upon the SBOM formatting code available in syft is creating inadvertent breaking changes in our API contract. For example, we cannot currently upgrade the version of syft that we are using without also changing the schema version of our CycloneDX SBOM output.

In order to stabilize the contract in the long run, we should be introduce functionality in packit/sbom capable of generating SBOM format outputs in many different schema versions of each SBOM type. Doing so will allow us to keep the SBOM output, and our API, stable while still consuming the latest versions of the syft library for scanning and generating the SBOM data.

Proposal

Our current implementation leverages the syft.Encode method to convert a sbom.SBOM into []bytes that can be written to a file.

output, err := syft.Encode(f.sbom.syft, syft.FormatByID(id))
if err != nil {
return 0, fmt.Errorf("failed to format sbom: %w", err)
}

The packit/sbom package does not currently implement any of the formatting itself. Instead, it leverages the existing formats made available by syft:

var id sbom.FormatID
switch f.format {
case CycloneDXFormat:
id = syft.CycloneDxJSONFormatID
case SPDXFormat:
id = syft.SPDXJSONFormatID
case SyftFormat:
id = syft.JSONFormatID
default:
return 0, fmt.Errorf("failed to format sbom: unsupported format %q", f.format)
}

Unfortunately, these formats are updated on a near-constant basis, and in a way that breaks backwards-compatibility in the SBOM output format contract.

Luckily, the syft library has recently undergone a refactoring that now allows us to define our own format types that can be used to encode SBOM output. To implement a format, you need to create a concrete type that conforms to the sbom.Format interface

type Format interface {
	ID() FormatID
	Encode(io.Writer, SBOM) error
	Decode(io.Reader) (*SBOM, error)
	Validate(io.Reader) error
}

We should implement a package internal to packit/sbom that implements at least the following formats (all versions we have ever released support for through the existing sbom package):

  • CycloneDX JSON 1.3
  • CycloneDX JSON 1.4
  • Syft 2.0.0
  • Syft 2.0.1
  • Syft 2.0.2
  • Syft 3.0.0
  • Syft 3.0.1
  • Syft 3.1.0
  • SPDX JSON 2.2

We should feel free to reuse the format implementations in syft for the cases they cover, but beware that these formats could skew away from the designated schema version at any point.

Choosing a format at build-time

The criteria for choosing a format will be to follow the specification of the sbom-formats media type included in the buildpack.toml. For example, with the following declaration, I should see formatted SBOM outputs using the Syft 3.0.1 schema.

sbom-formats = [ "application/vnd.syft+json;version=3.0.1" ]

The IANA hosts the specifications for each of the following media types:

As can be seen in these specifications, the Syft and CycloneDX formats allow for an optional version parameter. We can parse this extra field to determine the specific schema version to use. SPDX does not currently outline a similar parameter, but there is an open issue requesting that feature: spdx/spdx-spec#642

If the version parameter is omitted, the latest schema version for that SBOM type should be chosen.

Integration

Currently, the sbom-formats field is limited in what it accepts as valid media-types. Specifically, it won't allow us to specify the extra version parameters we wish to use. We can still implement and test this functionality in packit/sbom, but won't be able to use it in real buildpacks until we see a resolution on this issue: buildpacks/lifecycle#828

@fg-j Is this done now?

fg-j commented

@ryanmoran Depends on if we think it's worth adding more of the enumerated schema versions. My understanding was that syft 2.0.2, cyclonedx 1.3 and syft 3.0.1 represented a "good enough for now" set of supported schemas. Do you think we'll need to revisit which ones we support?

@fg-j I think this is probably done for now w.r.t. formats we would support. That may change when the official implementation lands in Syft, but for now, we have achieved what this issue set out to solve.