jpeddicord/askalono

Does not detect MPLv2 header in source file

Closed this issue · 6 comments

lib.rs

// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

struct X;
$ askalono --version                                              
askalono 0.3.0

$ askalono id --optimize src/lib.rs
Error: Confidence threshold not high enough for any known license

The MPLv2 has the following exhibit which confirms that the header is correct:

Exhibit A - Source Code Form License Notice
-------------------------------------------

  This Source Code Form is subject to the terms of the Mozilla Public
  License, v. 2.0. If a copy of the MPL was not distributed with this
  file, You can obtain one at http://mozilla.org/MPL/2.0/.

If it is not possible or desirable to put the notice in a particular
file, then You may include the notice in a location (such as a LICENSE
file in a relevant directory) where a recipient would be likely to look
for such a notice.

Interesting, I'd have thought that MPL header would be in the SPDX dataset but it doesn't appear to be. This is something that'll need to get fixed on that end; I'm happy to get that going but acknowledge that I can be a bit slow on that point.

I think https://github.com/spdx/license-list-XML/blob/master/src/MPL-2.0.xml more or less needs a standardLicenseHeader block, and then the license-list-data repository needs to be regenerated.

You're not wrong; we should add the license header text to the definition of that license. But I'd also ask you to consider using SPDX Short Identifiers in source code, for a more succinct and more machine-friendly application of any license on the SPDX License List to your sources: https://spdx.org/ids

Up to you and your project, of course, but I mention because not a lot of people know about that option.

@bradleeedmondson Absolutely, SPDX short identifiers are the way to go. But in this case, askalono is something that tries to identify licenses from texts -- the issue here is that some of the text it should be identifying is missing from its dataset.

spdx/license-list-XML#849 has been fixed.

Can we update askalono to include this header so this issue may be closed?

Working on pulling in new SPDX data now. Interestingly, it's causing a unit test to fail (a self-test to ensure MIT is detected as MIT). I suspect the format may have changed slightly. Digging into that.

Pulled in and verified:

❯❯❯ cat test.txt
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.

struct X;

❯❯❯ just cli id --optimize ./test.txt
 ...
./target/release/askalono id --optimize ./test.txt
License: MPL-2.0 (license header)
Score: 0.972
Containing:
  License: MPL-2.0 (license header)
  Score: 1.000
  Lines: 0 - 4
  Aliases: MPL-2.0-no-copyleft-exception

This will go out in the next release, which I hope to prepare soon. :)