SPDX backend
maoo opened this issue ยท 16 comments
I like this project and I think it would be really helpful for our Software Foundation; since we are increasingly adopting SPDX, I though it would have been cool to add a backend for it, which replaces the local licenses
folder and also validates the license Identifier passed by the user.
I've dropped some code on https://github.com/maoo/legit/tree/spdx-backend ; although it's not final, it runs locally without blowing up, hopefully (my Node skills are very humble); the README file explains how to use and configure it.
This is the way it works:
- A user runs the script passing a license with the
-l
option, as before; the only difference is that now it must be a valid SPDX Identifier, otherwise it will fail - legit validates the SPDX Identifier against SPDX using spdx-licenses npm
- legit downloads and parses the license text from
https://spdx.org/licenses/<Identifier>.html
- If a placeholder configuration is available for that license, legit will try to resolve those values from command-line options and replace them in the license text
Placeholder definitions are hosted on github and can be extended by the community.
The implementation is not complete, there are some known issues that I've also reported in the README
- placeholder list is hardcoded (
user
,year
,oneline
), should be parametric - add more items in license-placeholders.yml
- make
license-placeholders.yml
url configurable - allow to resolve
license-placeholders.yml
with a file-system path - placeholders including
'
character don't work - Regexp support for license placeholders
I'm eager to know what others think about SPDX and this implementation; if you like the idea, I'd be happy to work on it further and send a Pull Request.
Thanks for sharing this work in the open!
As a head's up, @maoo and I are trying to get organised and merge our respective forks so we can submit them as a PR. Apologies for not doing that from the get go - we're in different timezones and don't have a lot of overlap in which to coordinate our efforts. ๐
This is pretty much what I did in my sort-of-fork https://github.com/jacobmischka/papers, though it allows any of the name, spdx id, or nickname of a license (as listed in github/choosealicense.com, using a quick json I made of the licenses.
Edit: That sounds like an advertisement, which I didn't really mean it to be. I just mean you can take any bits you want or use that JSON file I threw together. I only created it because I wanted something to use myself that reads from package.json which is out of scope for this package.
Thanks @jacobmischka ! I'll definitely checkout your implementation; it would be great to include some of your code's feature in a Pull Request against this repo.
I'm eager to know what you (and others) think of PR #15, which combines my initial implementation with some additions from @pmonks (thank you!)
I was planning on implementing some of this over the weekend by leveraging GitHub's License API. The API is in dev preview mode right now but I think it does a good job of providing
I also like the fact that it provides the body of the license within the JSON payload so we don't have to worry about fetching from a URL.
Does SPDX have a similar JSON API that can be used to fetch licenses?
Hi @captainsafia , I see that GitHub's License API uses spdx_id
in the payload, so we'd still support SPDX; I like the idea of adopting it as main backend (definitely better than spdx.org, which does not provide API and is not suited for that), let me know if/how I can help.
We'd probably still want to have a mechanism to replace tokens in the license body, such as [year]
or [owner]
; what do you think of the approach we've taken with license-placeholders.yml ?
My gut tells me that going direct to the source (i.e. SPDX) is the right way to go, for a couple of reasons:
- GitHub have a bit of a history of not really grokking open source licensing (they don't mandate a license for repos they host, for example, and as a result the percentage of properly licensed GitHub repositories is poor)
- The GitHub License API looks like a thin veneer over a subset of SPDX - why not cut out the middleman?
- The GitHub License API doesn't appear to support SPDX 2.1 License Expressions. Now admittedly these are probably rarely used (and right now
legit
doesn't support them either, though that could be remedied), but when you need them, you really really need them. - The GitHub License API is in preview, so there's a chance (probably a small chance, but non-zero) that it won't end up becoming official
It's also a possible turn-off for folks who want to use legit
but don't use GitHub - going directly to SPDX (a Linux Foundation Collaboration Project) may carry more weight vs appealing to GitHub's lesser authority on the topic of open source licensing (after all, open source licensing is at the centre of what SPDX do, but it's peripheral to GitHub).
wdyt?
I think without a dedicated API endpoint I don't really like the idea of fetching from a random URL every time. If one were to go forward with the official spdx source then I think the should all be downloaded as a dependency at install time.
It's also worth noting that github includes licenses in addition to the official spdx list, such as WTFPL.
@jacobmischka it's really not a "random URL" though - the SPDX project has a well-organised, comprehensive set of licenses in their license-list repository, and that's where our PR pulls the license texts from.
EDIT: the idea of pulling down the licenses at build time and "baking" them into the downloaded legit
package is interesting, but it does create an avoidable coupling between SPDX releases and releases of legit. Pulling them at runtime (as our PR does) feels more scalable to me.
Oh last I checked I thought it was just scraping their site. In any event that makes it even easier to fork that repo and add a package.json and publish it on npm instead of doing an HTTP get every time someone wants to copy a text file. Being usable offline would be a big advantage imo.
Legit is not a binary. Using dependencies from npm is extremely common.
s/binary/package
- my point remains.
Regarding offline usage - that was one of my thoughts too, but the reality is that I'm offline rarely enough that that wouldn't be a showstopper for me. That would be a worthwhile enhancement though, imho.
I should also point out that both the spdx-licenses
and spdx-license-list
npm modules our PR introduces have exactly the problem I mention above - they're both out of date with the latest SPDX release, in large part because they replicate the SPDX data instead of looking it up.
Interestingly, I just discovered that the SPDX project has published recommendations on how to programmatically access the SPDX license list, and by chance our PR mostly adheres to those recommendations.
Ah, that makes fetching from the URLs a lot more appealing but I still think forking their repository and publishing it on npm to use as a dependency would be better.
I don't think doing that is making it any more coupled than fetching it from their website is. Semantic versioning means that updates to the dependency don't rely on anyone updating legit, it's more aligned with the javascript ecosystem, and it makes legit no longer depend on an active internet connection, albeit at the cost of someone needing to maintain that npm package.
My big goal was this was for it to be network-independent. Instead of loading the license every time on on command, it would be loaded on post-install inside package.json. Although most people have Internet connectivity, they won't notice if the command is using a locally stored version of the licenses or fetching from the network. Those that don't will so I think it's best to build for them. I'm OK with making a new release. It seems like new releases of the list don't happen that frequently. We can also always have a legit update
command if need be.
Is there a JSON-based API for SPDX (as opposed to RDF)?
The SPDX license list is available in a variety of data formats, including JSON. The "API" (such as it is) is simply an HTTP GET of this resource.
From that resource you can then HTTP GET (by substituting the SPDX License Identifier into this URL) each of the individual license text files.
@pmonks I was about to reply you on the SPDX ML exactly that.
You could also use this https://github.com/spdx/license-list-data repo of as part of your npm build with a clone and/or a .gitmodule to avoid any fetching/network dep at run time.