/big-data-datasets

Curated list of Publicly available Big Data datasets. Uncompressed size in brackets. No Blockchains.

Big Data Datasets

Curated list of Publicly available Big Data datasets. Uncompressed size in brackets. No Blockchains.

Structured

Text

  • CommonCrawl (AWS) - A corpus of web crawl data composed of over 25 billion web pages.
    • Semi-Structured (includes Metadata): 250 TB
  • DBpedia - curated wikipedia data
  • Freebase
    • Freebase: 22 GB (250 GB)
    • Freebase Deleted Triples: 2 GB (8 GB)
    • Freebase/wikidata Mappings: 22 MB (243 MB)
  • StackOverflow Data (BigQuery) - 182 GB

Image

Audio

Bonus: API / Streamdata / "Self-Service"

Bonus: Opendata / Census / Government data

Meta / Lists / Sources

These pages might link to datastes which are already in the list.