archivespark

There are 3 repositories under archivespark topic.

  • helgeho/ArchiveSpark

    An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

    Language:Scala144142519
  • helgeho/ArchiveSpark2Triples

    Convert web archives to RDF triples with ArchiveSpark

    Language:Jupyter Notebook1101
  • helgeho/Tempas2ArchiveSpark

    ArchiveSpark DataSpec to analyze the Internet Archive's Web archive through temporal search results returned by Tempas (v2)

    Language:Scala110