nexB/license-expression

Add support for "with" statement

Closed this issue · 8 comments

Also support validation against a list of given exceptions

Thanks! Yes we need this indeed. I think the best route is to treat XXX with ABC as an "atomic" symbol when parsing an expression and evolve the code such that both the license and the exception symbol can be obtained from this. By atomic I mean that this symbol could not be further simplified in a boolean simplification. This would also mean that the exception could only be applied to a single license and not a whole expression. The reason to make this is simplicity and correctness. The implication is that:

  • (gpl and lgpl) with gcc-exception would never be a valid expression: an exception can only apply to a single license and not a whole expression
  • gpl with gcc-exception and lgpl with gcc-exception could be a valid expression and would not be simplified further as it is equivalent to XXXX and YYYY as lgpl with gcc-exception would be treated as a whole symbol.
  • gpl with gcc-exception and gpl with gcc-exception and gpl would be simplified to gpl with gcc-exception and gpl.... e.g gpl and gpl with gcc-exception are treated as two different symbols when it comes to boolean expression simplification.

Does this make sense?

Also, would the list of expressions contain which license(s) an exception can apply to? If that is the case what would be the data structure for this? I suggest a simple namedtuple or small object that would contain the exception license key and a list of license keys this would apply to. Would this work?

Note that if we want also to support expressions such as GPL 2.0 or later and MIT we would likely need to make some adjustments to the expression parsing to treat GPL 2.0 or later as a whole single atomic symbol rather than two symbols: GPL 2.0 and another later symbol. We could also support more sophisticated constructs such as GPL 2.0 or any later version and similar small variations?
Let me think about this... This is only when we have a verbose expression with literal or later .... this does not affect handling symbols such as GPL-2.0+ or gpl-3.0-plus which are already treated as atomic symbols alright.

So I am wondering how to best pass arguments for validation against a list of licenses....and a list of exceptions.
I think I will go with a namedtuple (or andy objetct with the same attributes) that would cover both licences and exceptions with these attributes:

  • key
  • aliases: a list of alternative strings for this key (e.g. short or long names, etc)
  • exception_to: empty or a list of keys this key is an exception for.
    @tdruez would this work? It would change the API.

FWIW, the license list would be objects like
License = collections.namedtuple('License', 'key', 'aliases', 'exception_to')

I am having second thoughts about this:
May be we do not have a always an exception that refers to all the licenses it "excepts" to?
In which case tracking a list of which main licenses an exception is for would be a moot point entirely and a waste of time and energy...
And any validation of an expression would become moot as there would be case where the expression would be considered invalid but would still be valid otherwise.

So I digged a bit what are the ranges of exceptions out there... I reviewed ALL the exceptions we have at SPDX.org and in ScanCode.
The vast majority apply to either LGPL or GPL various versions and various "or later" or not.
Only a few apply to the CPL and Apache license or even one to the BSD.
So I am thinking that tracking and validating which license an exception is for may be overkill.

If this is the case this would simplify much the implementation as we would NOT/could NOT validate the license on the left side of an XXX with YYY expression .... e.g. it could be any value (with some crazy possibilities, though)

Even just collecting which exact list of licenses an exception is for is NOT a trivial thing. To be accurate and comprehensive the complexity is rather high. So all in all I am in favor to eschew that complexity entirely, eg. an exception would only be tagged by a flag for now

we could always add validation against a list later if needed.... but removing is later would be tough
So we would allow idiotic MIT with GCC-3.1 exception but that is not a big deal

I also recall the SPDX group passed that too because of its complexity
And FWIW, while writing the validation code in license_expression, this was starting to smell really bad too 😛

So no tracking which license an exception may apply to (and therefore without validating if a license+exception combo makes sense or not)
And what I mean byt not validating I mean this:

  • GPL-2.0 with Classpath-exception is valid because GPL-2.0 is a license and Classpath-exception is tagged as an exception
  • Classpath-exception with GPL-2.0 is invalid because GPL-2.0 is not an exception
  • MIT with Classpath-exception is valid because we will not for now track which license(s) the Classpath-exception may apply to (edited)

So the last case is the one we will not do validation for now and may never as it may not be practically possible to do correctly in some cases

Therefore the license reference list for validation would be objects like
LicenseRef = collections.namedtuple('LicenseRef', 'key', 'aliases', 'exception') where exception is a boolean.

Let's have a meeting about all this early next week. I have not had time to go through all the notes thoroughly, but I generally am of the opinion that a very simple model is best. We need to discuss this in real time.

Fixed in master