spdx/tools-golang

ValidateDocument() fails for official SPDX Spec 2.2/2.3 examples

wallrat opened this issue · 2 comments

The following files from the spdx-spec repository fails to validate

https://github.com/spdx/spdx-spec/blob/development/v2.3.1/examples/SPDXJSONExample-v2.3.spdx.json
https://github.com/spdx/spdx-spec/blob/development/v2.3/examples/SPDXJSONExample-v2.2.spdx.json

(only tested the JSON versions sofar)

ValidateDocument() returns err ToolsElement used in relationship but no such package exists

https://tools.spdx.org/app/validate/ (based on the Java lib?) validates both fine.

From the 2.3 BOM:

...
  "externalDocumentRefs": [
    {
      "externalDocumentId": "DocumentRef-spdx-tool-1.2",
      "checksum": {
        "algorithm": "SHA1",
        "checksumValue": "d6a770ba38583ed4bb4525bd96e50461655d2759"
      },
      "spdxDocument": "http://spdx.org/spdxdocs/spdx-tools-v1.2-3F2504E0-4F89-41D3-9A0C-0305E82C3301"
    }
  ],

...

  "relationships": [
... 
    {
      "spdxElementId": "SPDXRef-DOCUMENT",
      "relationshipType": "COPY_OF",
      "relatedSpdxElement": "DocumentRef-spdx-tool-1.2:SPDXRef-ToolsElement"
    },
   ... 
]

Not enough of an expert on the spec to determine who is in the wrong here. Shouldn't ValidateDocument() take external refs into account? It currently only checks for refs to packages, and files.

Hi @wallrat,

You may have seen that I asked about this in the main SPDX 2.x spec repo, at spdx/spdx-spec#870.

Based on the discussion there, it sounds like "validation" for purposes of SPDX tooling generally has looked only at the current SPDX Document, not at e.g. any other Documents that are referenced via an External Document Reference.

So here, the SPDX 2.2/2.3 example documents reference an external DocumentRef-spdx-tool-1.2 Document, which is presumed to exist but is not available (and doesn't in fact actually exist). It also has a Relationship that points to an SPDX Element in that external Document, DocumentRef-spdx-tool-1.2:SPDXRef-ToolsElement.

Based on spdx/spdx-spec#870, it sounds like the presence of that external document and/or the SPDXRef-ToolsElement in it do not need to be checked for the example Document to be considered "valid." All that is checked is that the syntax of that identifier is validly formatted.

For the Golang tools purposes, the developers could consider whether to e.g. allow different levels of validation -- perhaps a "strict" validation that actually checks every referenced Document, vs. a "default" validation that doesn't look into nested external Documents.

As a tooling vendor1 building on top of SPDX, and multiple other formats, I'm mostly concerned with end-user expectations - which usually means to be very forgiving with what we accept and give sane feedback on bad input.

With that in mind, we usually consider a few levels of 'validity':

  1. Is the document 'well-formed' ? This most often means that the document is parsable (syntactically correct) and some set of minimal of information can be meaningfully extracted. For us, failing this level the document is rejected as 'bad input'.
  2. Is the document 'valid' according to the documents declared spec version and format? Here we expect the document to contain all required fields and ids, document internal references to resolve correctly etc. My experience is that this is usually what end-users means with 'valid' and the level we expect all tools to produce (or they are considered broken). For us, failing this level usually means that we try our best to process the document, but let the end-user now about any errors found.
  3. Do the document contents make sense within a larger context? This means checking if external references resolve to existing resources etc. Note this is not a pure function of the input - if the document was valid at one point in time doesn't mean it's necessarily valid now.

What I would expect from a library is to cover 1 and 2. Looking at external resources is way too use-case dependent.

From a developer perspective I would definitely expect ValidateDocument() to accept all examples from the spec repo to pass (and be part of the library's test suite).

(as a side note the Go library also fails to parse the XML/RDF example but at the XML layer. I will open a separate issue for that)

Footnotes

  1. https://sbom.observer