nexB/license-expression

Provide built-in support for SPDX and scancode license expression validation

Opened this issue · 5 comments

I would like to have a function that takes an expression string as an argument and validates this expression. It could be build from Licensing.parse() but I would prefer having it return some object that tells me everything about the expression validity:

  • if the syntax is valid or not and error messages if not
  • what are the valid and invalid license symbols
  • what are the valid and invalid exceptions
  • what are the obsolete license symbols

This function should be taking either the ScanCode license DB as an input for license symbols ( https://scancode-licensedb.aboutcode.org ) or some list of symbols. It should bundle an up-to-date licenses list from ScanCode and SPDX for easy bootstrapping. For this we need nexB/scancode-licensedb#7
In addition it should also support and accept arbitrary LicenseRef- (and possibly DocumentRef- ) in SPDX mode.

Some example:

$ wget https://scancode-licensedb.aboutcode.org/index.json
$ python
>>> import json
>>> lics = json.load(open('index.json'))
>>> lics[0]
{'license_key': '389-exception', 'json': '389-exception.json', 'yml': '389-exception.yml', 'html': '389-exception.html', 'text': '389-exception.LICENSE'}
>>> from license_expression import LicenseSymbol, Licensing
>>> syms =[LicenseSymbol(l['license_key']) for l in lics] 
>>> ling=Licensing(symbols=syms)
>>> ling.parse('foo AND mit')
AND(LicenseSymbol('foo', is_exception=False), LicenseSymbol('mit', is_exception=False))
>>> ling.parse('foo AND mit', validate=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/licexp/tmp/lib/python3.6/site-packages/license_expression/__init__.py", line 453, in parse
    raise ExpressionError(msg)
license_expression.ExpressionError: Unknown license key(s): foo
>>> e=ling.parse('foo AND mit')
>>> e.symbols
{LicenseSymbol('foo', is_exception=False), LicenseSymbol('mit', is_exception=False)}

@pombredanne When we are parsing a license expression using Licensing().parse(), should the .parse() method be automatically able to determine whether or not an expression is an SPDX license expression or a scancode license expression or should there be a flag that tells the .parse() method what kind of license expression to expect?

@JonoYang I think the new validation feature should be explicit about which license list is used as a base and there should be no guessing there about whether an expression is from scancode or from SPDX.

In addition to validation, could you also provide a normalized (whitespace, case, parens) version of the string passed in?