archivespark
There are 3 repositories under archivespark topic.
helgeho/ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
helgeho/ArchiveSpark2Triples
Convert web archives to RDF triples with ArchiveSpark
helgeho/Tempas2ArchiveSpark
ArchiveSpark DataSpec to analyze the Internet Archive's Web archive through temporal search results returned by Tempas (v2)