License for public dataset
EgorBu opened this issue · 5 comments
Hi!
We need to decide about licenses we can (have to) use for public datasets and add it to guide.
Examples of licenses:
In PGA: https://github.com/src-d/datasets/blob/master/DCO
In other dataset related to data from Github: https://www.kaggle.com/davidshinn/github-issues
IANAL but if we include actual source code, we cannot use a traditional software license (e.g. Apache or GPL), since it would be incompatible. We can look into some database licenses instead: https://opendatacommons.org/licenses/
I had a quick chat with @eiso about this, I wonder if he could share his knowledge here.
@campoy using the ODBL 1.0 license suggested by @smola is a good option because we specify the individual content licenses to the best of our ability in the index file of the dataset.
Databases can contain a wide variety of types of content (images,
audiovisual material, and sounds all in the same database, for example),
and so the ODbL only governs the rights over the Database, and not the
contents of the Database individually. Licensors should use the ODbL
together with another license for the contents, if the contents have a
single set of rights that uniformly covers all of the contents. If the
contents have multiple sets of different rights, Licensors should
describe what rights govern what contents together in the individual
record or in some other way that clarifies what rights apply.
There is another option which is the Community Data License Agreement by the Linux Foundation but it hasn't picked up a lot of steam since launch. So let's go for ODBL.
Created an issue to track the change in the datasets repo, I'll close this one.