nexB/aboutcode-toolkit

Support Attribution generation from SPDX license expressions in an [INPUT] file

mjherzog opened this issue · 4 comments

If you want to generate an Attribution Notice (attrib function) from an SBOM or other [INPUT] file that contains SPDX data for license expressions, that file will need to include LicenseRef and license_file or notice_file data for every license that does not have a License Identifier in the SPDX License List. (Note: ScanCode already the second requirement with pre-defined LicenseRef-scancode license identifiers)

There are some open design questions:

  • Do we need to validate that every LicenseRef from the [INPUT] file has at least a license_file or notice_file (or both)?
  • Do we display the LicenseRef as an SPDX License Identifier for traceability?
  • What validation against the SPDX License List is required for asserted SPDX License Identifiers (not including LicenseRef cases)?

Here is my suggestion:
We can use https://scancode-licensedb.aboutcode.org/index.json to convert the SPDX License identifier back to the ScanCode license key. If a particular "spdx_license_key" cannot be located in the index.json (due to an invalid license identifier or a custom LicenseRef), we can handle it as follows: If the license_file/notice_file field is populated, issue a warning indicating that the spdx_license_key could not be found in the licenseDB but the license_file/notice_file field is filled. Alternatively, if there is no license_file/notice_file, raise an error indicating that the spdx_license_key was not found and no license_file/notice_file is filled.

and for

Do we display the LicenseRef as an SPDX License Identifier for traceability?

I'll say yes to not alter what user's input.

@dmclark please comment

After some thought, actually I think the tool should not need to validate the license_file/notice_file for 2 reasons:

  1. Having a license_file/notice_file may not neccessary mean that the license/notice file is referencing the "invalid" license
  2. The tool fetch the license from DJE/LicenseDB, if we cannot translate the spdx_license back to the scancode_license, it is true to say "invalid" in a sense that it cannot be found in the database, and it's user's responsibilty to make sure their custom license has porper license_file/notice_file filled.

Here is the behavior of the latest code in "513_attrib_from_spdx" branch:
spdx_license_expression: LicenseRef-scancode-libzip AND ((AFL-1.5 OR BSD-2-Clause-Views) AND LicenseRef-scancode-bsd-1988) OR AFL-1.3

output:
license_expression: bsd-new AND ((AFL-1.5 OR bsd-2-clause-views) AND bsd-1988) OR AFL-1.3

with error:

Command completed with 2 errors or warnings.
ERROR: <path> : Invalid 'license': AFL-1.5
ERROR: <path> : Invalid 'license': AFL-1.3

@mjherzog @DennisClark Let me know what do you think?

Fixed in 27b3068