protobom/protobom

Proposal: Change software identifier data type to map

puerco opened this issue · 2 comments

puerco commented

Right now, the software identifiers in protobom are stored in an array with the type and value each in its own field:

https://github.com/bom-squad/protobom/blob/a50c98ecf0438b3b899a3e780ec012f477ed4154/api/sbom.proto#L45
https://github.com/bom-squad/protobom/blob/a50c98ecf0438b3b899a3e780ec012f477ed4154/api/sbom.proto#L144-L147

This structure is designed in that way because documents in SPDX can have more than one identifier of any type (more than one purl, more than one cpe, etc).

This, however, poses a performance and usability problem with protobom. Finding a certain identifier is slow and cumbersome as we need to cycle all identifiers looking for a particular type. This is exponential when cycling through all packages in the sbom. The free string also poses a risk that keying on the right type may not always be accurate.

Proposal:

I propose we change the identifier type to a map that uses a predefined map as the key:

 map[IdentifierType]string

This way we would get a one-shot access when trying to access the purl, etc:

purl := node.Identifiers[IdentifierType_PURL]

To keep with the lossless ingestion promise, we can create a new property "OtherIdentifiers" in the Node to capture when there are more than one identifier:

node.OtherIdentifiers = []struct {
  Type IdentifierType
  Value string
}
puerco commented

Update: We discussed this change on the Aug 9th meeting and we're moving on to convert the identifier to a map.

puerco commented

PR of implementation is in: #70