Project uses Apache Spark to parse a corpus of Common Crawl web data. Pushes parsed data to ElasticSearch for indexing and presentation.
triggan/cc-spark-elasticsearch
Project uses Apache Spark to parse a corpus of Common Crawl web data. Pushes parsed data to ElasticSearch for indexing and presentation.