Pinned Repositories
dvenabler
Adds DocValues to Solr index fields without full re-index
heatmap
A GitHub-inspired graph for visualising activity
heritrix3-wrapper
Small wrapper to start/stop and communicate with Heritrix 3.
jwat
Java Web Archive Toolkit
jwat-tools
JWAT Tools
netarchivesuite
Netarchivesuite 5.X development
netarchivesuite-svngit-migration
Git conversion of Subversion repository.
netsearch
Merged search-arctika and search-achon into a multi-module project
so-me
Social Media harvests
solrwayback
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
netarchivesuite's Repositories
netarchivesuite/solrwayback
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
netarchivesuite/heatmap
A GitHub-inspired graph for visualising activity
netarchivesuite/netarchivesuite
Netarchivesuite 5.X development
netarchivesuite/netsearch
Merged search-arctika and search-achon into a multi-module project
netarchivesuite/dvenabler
Adds DocValues to Solr index fields without full re-index
netarchivesuite/so-me
Social Media harvests
netarchivesuite/jwat-tools
JWAT Tools
netarchivesuite/heritrix3-wrapper
Small wrapper to start/stop and communicate with Heritrix 3.
netarchivesuite/jwat
Java Web Archive Toolkit
netarchivesuite/netarchivesuite-docker-compose
Testbed for some netarchivesuite docker experiments
netarchivesuite/netarchivesuite-umbra-docker
netarchivesuite/jwarc-cdx-indexer-workflow
Will process all warc-files defined in a text file with JWARC and send to a CDX-server (Outback CDX etc.) . If process is stopped and restarted it will continue from where it was.
netarchivesuite/logtrix
Java library/tool for parsing and summarising Heritrix crawl logs
netarchivesuite/openwayback-netarchivesuite
NetarchiveSuite fork of OpenWayback
netarchivesuite/solrwaybackrootproxy
Using the solrwaybackrootproxy will improve playback, can redirect and fix leaked resources.
netarchivesuite/webdanica
System for finding Danish webpages outside the .dk domain
netarchivesuite/heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
netarchivesuite/browsertrix-cloud
Danish Royal Library customisations and modifications
netarchivesuite/cdx-summarize-warc-indexer
Summarize Web Archive holdings using an existing SOLR index
netarchivesuite/crawlrss
Crawl RSS - Heritrix 3 add-on
netarchivesuite/fits-wrapper
Small FITS wrapper to run it using a custom classloader and provide some basic JAXB (un)marshalling of the XML output.
netarchivesuite/hadoopifications
Attempts to create hadoop jobs from other processes
netarchivesuite/jwarc
Java library for reading and writing WARC files with a typed API
netarchivesuite/jwat-tools-gui
JWAT Tools minimal GUI version
netarchivesuite/jwat-wayback-resourcestore
Wayback resourcestore using JWAT
netarchivesuite/lap-writer-warc
WARC writer for INAs Live Archiving Proxy
netarchivesuite/openwayback-netarkivet-overlay
Project to create a customised openwayback for netarkivet using maven overlays.
netarchivesuite/umbra
A queue-controlled browser automation tool for improving web crawl quality
netarchivesuite/vagrant-hadoop-hive-spark
Vagrant project to spin up a single node VM running current versions of Hadoop, Hive and Spark
netarchivesuite/webarchive-discovery
WARC and ARC indexing and discovery tools.