Pinned Repositories
ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
ia-hadoop-tools
archive-analysis
Tools to analyze web archives
archive-web-graphs
Build and Analyze Archival Web Graphs
ars-docker-notebooks
Docker container for ARS workshop
ars-workshop
Archive Research Services Workshop
warc-hadoop
WARC (Web Archive) Input and Output Formats for Hadoop
wikipedia-ia-external-links-monitor
Extract links from Wikipedia edits
vinaygoel's Repositories
vinaygoel/ars-workshop
Archive Research Services Workshop
vinaygoel/archive-analysis
Tools to analyze web archives
vinaygoel/archive-web-graphs
Build and Analyze Archival Web Graphs
vinaygoel/ars-docker-notebooks
Docker container for ARS workshop
vinaygoel/warc-hadoop
WARC (Web Archive) Input and Output Formats for Hadoop
vinaygoel/wikipedia-ia-external-links-monitor
Extract links from Wikipedia edits