Open Data Sources
- Availability and access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
- Reuse and redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine-readable.
- Universal participation: everyone must be able to use, reuse and redistribute — there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
-- Definition by the Open Knowledge Foundation
Lists of Data Sets
- Interesting Data Sets for Statisticians - editorialized, entertaining set of open data
Open Data
- List of Public Datasets - user-curated
- DBpedia - utilizing a large multi-domain ontology
- Public Data Sets on AWS - common web crawl corpus, NASA satellite imagery, Human Genome, Google Book NGrams, Wikipedia Traffic, Million Song Dataset, Federal Reserve Economic Data, PubChem, more.
Private Opened Data
- New York Times - vocabulary as linked open data; linked vocabulary of people, places, companies, etc.
Governmental Data
Compendium of Governmental Open Data Sources
- (USA)
- Africa Open Data
- US Census - Population Estimates and Projections, Nonemployer Statistics and County Business Patterns, Economic Indicators Time Series, more.
Non-Governmental Org Data
- The World Bank - business regulation measures, company-level data in emerging markets, household consumption patterns, World Development Indicators, World Bank finances
- ^Pew Research Center's Internet Project
Academic Data
Inter-university Consortium for Political and Social Research Data Portal
- Surveys of Economic Attitudes and Behavior
- Continuing Series of Consumer Surveys
- Historical and Contemporary Economic Processes and Indicators
Truly Random Data
Open Data Resources
- reddit r/datasets
- Open Data - Stack Exchange (discussion)
^ license is not truly open, involves some limitations