data-apis/governance

Licensing of code and documents

rgommers opened this issue ยท 12 comments

For code we have to decide what open source license to use. Most relevant projects that are community driven use the BSD-3 (or sometimes MIT) license, while most relevant projects that are driven by a company use the Apache 2.0 license.

MIT is typically preferred to BSD-3 because, while they're almost identical, there are multiple BSD licenses. See e.g. https://choosealicense.com/ for clearly recommending MIT.

So the main choices seem to be:

  • MIT
  • Apache 2.0

The reason why community projects have typically avoided including Apache 2.0 in the past is because it would make the code base incompatible with GPLv2 (see this explanation). That said, there were never proposals for integrating a large amount of useful code that was Apache 2.0 licensed. My sense is that nowadays GPLv2 is less relevant, and if there was Apache 2.0 with a lot of value on offer, community projects would accept it.

Another option could be to dual-license as both MIT and Apache 2.0 to easy integration into other code bases, although that is a little unusual and may be more confusing than helpful.

For documents we also have to pick a license. Options include:

  • Using the same license as for code (see above). This is how documentation and website content in open source projects is typically treated.
  • Use a Creative Commons License
  • Use something like the Unlicense

If everyone could state their preference in a comment on this issue, that would be very useful.

Re code I vote Apache. I think the patent protection is a stronger argument than the GPLv2 issue.

Re documentation: Unlicense seems to be intended for code, I think CC-0 would be more appropriate and has the same intent. It might be tricky to determine what is documentation and what is code, for example for Examples in notebooks, so maybe using the same license would be easiest?

MIT and Apache 2.0 are both fine to me. And I'd stick to the same license for all artifacts.

In my experience, the two licenses have different strengths for those of us inside a company, but working on OSS:

  • It is generally easier to contribute things under MIT license than ASL, as it has no patent implications.
  • On the flip side, software with an ASL is easier to consume, precisely because that software comes with patent licenses as well.

My vote is for Apache-2.0 for both code and documentation.

aregm commented

There is more than just a license (and I vote for MIT for the code). How do we ensure that the spec is closed for changes once released, but open to incorporate changes through the governing process? We definitely do not want forks from this works to go off. Better to be a free specification to be used by all interested parties, but closed to the change and rebranding.

How do we ensure that the spec is closed for changes once released, but open to incorporate changes through the governing process? We definitely do not want forks from this works to go off.

Indeed, we will need to document how the spec can evolve. This includes a governance aspect, and a versioning and backwards compatibility aspect. It's already in the RFC outline as a topic. I'll open a separate issue for it now. EDIT: see https://github.com/pydata-apis/workgroup/issues/12

Better to be a free specification to be used by all interested parties, but closed to the change and rebranding.

Rebranding seems to suggest using a trademark. I'm not too worried about this early on(it's hard to get people to put significant effort into this stuff), but it's a conversation worth having now indeed. I'll open another separate issue. EDIT: see gh-3

Happy to support BSD/MIT, but just out of curiosity @rgommers @maartenbreddels why is Apache trickier than BSD/MIT for you?

@amueller here are NumPy and SciPy issues with details on why Apache 2.0 has not been accepted in the past:

aregm commented

BSD can be problematic due to patent issues - it does not grant patent usage.

In Europe we don't have to worry about patents, which might bias me.

But I can read and understand MIT, and I think most people can. For Apache there is simply put more fluffy text, and my guess is most people don't like reading/understanding licenses.

@maartenbreddels I'm not sure that's true about patents. A lot of them hold in the EU as well, and many companies operate internationally. I used to work in computer vision where this is a big issue.
Though yes, it is indeed longer and harder to understand.

Adding the notes on the topic of choosing a license from the 4 June meeting here for completeness (PRs to add license files to follow):

  • Current choice between MIT and Apache-2.0. No current blockers either way.
  • Jack Pappas: How does licensing apply to specs?
    • Makes sense to license standardized testing and benchmark suite.
  • Adam Paszke: easiest for Google if everything is licensed the same. Unless, for spec, we license in the Public Domain.
  • Ralf Gommers: test and benchmark suite, documents which include code samples, may be additional documents and tooling, as well, such as Saul's tooling and API comparison.
  • Oleksandr Pavlyk: seems to make sense to license spec as Apache-2.0.
  • Carlo Curino: MIT is default at Microsoft. Either MIT or Apache-2.0 is fine. Preference is to license everything the same.
  • Ralf Gommers: seems we are leaning toward one license and keeping the documentation all under the same license.
  • Carlo Curino: do we need a legal advice? If so, Microsoft is willing to lend support of their legal department.
  • Adam Paszke: creative commons is a no-go for Google.
  • Ralf Gommers: unless CC0, creative commons is a no-go due to attribution requirements.
  • Oleksandr Pavlyk: what about spec modifications?
  • Jack Pappas: who would be modifying it? Does not seem applicable.
  • Ralf Gommers: the incentive is it's a spec, so modifying does not make sense.
  • Andreas Mueller: Apache-2.0 has patent protection.
  • Ralf Gommers: MIT is easier to produce, while Apache-2.0 is easier to consume.
  • Markus Weimer: Apache-2.0 is easier for companies to consume, but may make contributing more difficult.
  • Andreas Mueller: make easier to use, rather than easier to contribute. So optimize for consumption.
  • Markus Weimer: for MS, Apache-2.0 is easiest to consume; everything one has to think about, including MIT.
  • Ralf Gommers: for NumPy and SciPy, MIT is a no-brainer, while Apache-2.0 requires a conversation.
  • Adam Paszke: what are the artifacts that we are producing?
  • Ralf Gommers: tests are importable. Meaning the artifacts could be consumed. So a test suite could be pulled in, and, thus, licensing would be an issue.
  • Carlo Curino: bit and pieces of utility code might come out this work. Anyone against MIT?
  • Ralf Gommers: MIT, it is. Can comment on the issue if additional concerns, but we'll consider MIT the default license, unless any show-stopping objections.

All repos have license files now, so closing. Thanks all.