Pinned Repositories
ArchiveBot
ArchiveBot, an IRC bot for archiving websites
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
IA.BAK
We back up a lot of stuff from around the web; now it's time to back up the Internet Archive, just in case.
parler-grab
Archiving Parler.
seesaw-kit
Making a reusable toolkit for writing seesaw scripts
Ubuntu-Warrior
Scripts to build and boot warrior virtual machine containing Docker
warrior-dockerfile
A Dockerfile for the ArchiveTeam Warrior
warrior4-vm
Warrior virtual machine appliance (version 4)
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
wpull
Wget-compatible web downloader and crawler.
Archive Team's Repositories
ArchiveTeam/warrior-dockerfile
A Dockerfile for the ArchiveTeam Warrior
ArchiveTeam/ArchiveBot
ArchiveBot, an IRC bot for archiving websites
ArchiveTeam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
ArchiveTeam/terroroftinytown
URLTeam's second generation of URL shortener archiving tools
ArchiveTeam/imgur-grab
Archiving imgur.
ArchiveTeam/ludios_wpull
wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
ArchiveTeam/usgovernment-grab
Archiving parts of the US government.
ArchiveTeam/youtube-grab
Archiving all metadata from YouTube (everything except videos themselves due to size)
ArchiveTeam/urls-grab
Archiving URLs (outlinks) from a variety of sources.
ArchiveTeam/telegram-grab
Archiving public telegram messages.
ArchiveTeam/pastebin-grab
Archiving pastebin
ArchiveTeam/github-grab
Archiving GitHub
ArchiveTeam/megawarc
Nondestructive warc-in-tar to warc conversion
ArchiveTeam/goo-gl-grab
Archiving goo.gl, the Google URL Shortener.
ArchiveTeam/mediafire-grab
Archiving mediafire.com URLs.
ArchiveTeam/grab-base-df
Base Dockerfile for warrior project grab scripts
ArchiveTeam/glitch-grab
Archiving Glitch.
ArchiveTeam/.github
ArchiveTeam/itchio-minimal-grab
Archiving at risk data from itch.io.
ArchiveTeam/itchio-minimal-items
Managing items for itchio-minimal-grab.
ArchiveTeam/livestream-grab
Archiving livestream.com.
ArchiveTeam/oshietegoo-grab
Archiving 教えて!goo.
ArchiveTeam/oshietegoo-items
Managing items for oshietegoo-grab.
ArchiveTeam/peingquestionbox-grab
Archiving peing.net.
ArchiveTeam/peingquestionbox-items
Managing items for peingquestionbox-grab.
ArchiveTeam/tistory-grab
Archiving Tistory.
ArchiveTeam/tistory-items
Managing items for tistory-grab.
ArchiveTeam/twitch-items
Managing items for twitch-grab.
ArchiveTeam/typepad-grab
Archiving Typepad.
ArchiveTeam/typepad-items
Managing items for typepad-grab.