Consider switching to full SPDX license template matching
pmonks opened this issue · 2 comments
pmonks commented
Job Story
When the library is detecting licenses from license texts, I want that logic to use SPDX's matching guidelines, so I can be confident that it is detecting licenses in a way that is consistent with other tools in this space.
Potential Solutions:
SPDX publishes canonical license templates precisely for this purpose. Applying them is not necessarily trivial though, since:
- we'd probably want to cache all of the template files on local disk so that we're not re-reading them from the internet on every invocation - there are over 500 of them and they total several MBs in size
- it's computationally expensive as every template has to be matched against every single probable license text, to handle the case where multiple license texts have been concatenated into a single license file (yes this does happen in the Java/Clojure ecosystem...) - note however that the current (non-SPDX) logic assumes each text only contains a single license, so this would be a separate "side effect" improvement
pmonks commented
This has recently been added to the SPDX Java library, and it is probably better to leverage that functionality rather than rolling a green field implementation in Clojure.
pmonks commented
Fixed in v2.0