brainhubeu/license-auditor

Improved license name recognition

deevus opened this issue · 3 comments

License names seem to vary wildly across different projects. Here are 3 examples which use the same Apache 2.0 license:

    'Apache-2.0'
    'Apache 2.0'
    'Apache License, Version 2.0'

It might be possible to sanitise or tokenise these variations so that can they be recognised as the same license.

There also seems to be a common pattern of listing multiple licenses in package.json.

For example:

    '(BSD-2-Clause OR MIT)',
    '(BSD-2-Clause OR MIT OR Apache-2.0)',
    '(BSD-3-Clause OR GPL-2.0)',
    '(CC-BY-4.0 AND MIT)',
    '(GPL-2.0 OR MIT)',
    '(MIT AND CC-BY-3.0)',
    '(MIT AND BSD-3-Clause)',
    '(MIT AND Zlib)',
    '(MIT OR Apache-2.0)',
    '(WTFPL OR MIT)',
    '(AFL-2.1 OR BSD-3-Clause)',

These sometimes appear with or without parenthesis. Potentially it could recognise the AND/OR and use that to determine whether a module should be flagged depending on the combination of licenses and the conjunction used.

Thanks for reporting, we will take a look.

@deevus Regarding tokenisation, we will not be doing it for security reasons. It's possible that someone might create his custom license called e.g. "GPL" with some unknown terms and it would erroneously get flagged as GPLv3. The preferable choice of action when dealing with weird license name formats would be to persuade package maintainers to use SPDX.

As for multiple licenses, we have added it to our internal board.