Pinned Repositories
ArchiveBot
ArchiveBot, an IRC bot for archiving websites
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
IA.BAK
We back up a lot of stuff from around the web; now it's time to back up the Internet Archive, just in case.
parler-grab
Archiving Parler.
seesaw-kit
Making a reusable toolkit for writing seesaw scripts
Ubuntu-Warrior
Scripts to build and boot warrior virtual machine containing Docker
warrior-dockerfile
A Dockerfile for the ArchiveTeam Warrior
warrior4-vm
Warrior virtual machine appliance (version 4)
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
wpull
Wget-compatible web downloader and crawler.
Archive Team's Repositories
ArchiveTeam/warrior-dockerfile
A Dockerfile for the ArchiveTeam Warrior
ArchiveTeam/terroroftinytown
URLTeam's second generation of URL shortener archiving tools
ArchiveTeam/warrior4-vm
Warrior virtual machine appliance (version 4)
ArchiveTeam/ludios_wpull
wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
ArchiveTeam/youtube-grab
Archiving all metadata from YouTube (everything except videos themselves due to size)
ArchiveTeam/urls-grab
Archiving URLs (outlinks) from a variety of sources.
ArchiveTeam/telegram-grab
Archiving public telegram messages.
ArchiveTeam/urls-sources
Sources for urls-grab.
ArchiveTeam/blogger-grab
Archiving Blogger/Blogspot.
ArchiveTeam/megawarc
Nondestructive warc-in-tar to warc conversion
ArchiveTeam/grab-base-df
Base Dockerfile for warrior project grab scripts
ArchiveTeam/standalone-readme-template
Readme instructions template for manually running pipeline grab scripts outside the warrior
ArchiveTeam/fc2-items
Managing items for fc2-grab.
ArchiveTeam/glitch-items
Managing items for glitch-grab.
ArchiveTeam/goo-gl-items
Managing items for goo-gl-grab.
ArchiveTeam/sourceforgedeveloperweb-grab
Archiving SourceForge Developer Web.
ArchiveTeam/.github
ArchiveTeam/archiwumallegro-grab
Archiving Archiwum Allegro.
ArchiveTeam/archiwumallegro-items
Managing items for archiwumallegro-grab.
ArchiveTeam/askfm-items
Managing items for askfm-grab.
ArchiveTeam/gooblog-grab
Archiving gooブログ.
ArchiveTeam/gooblog-items
Managing items for gooblog-grab.
ArchiveTeam/oshietegoo-grab
Archiving 教えて!goo.
ArchiveTeam/oshietegoo-items
Managing items for oshietegoo-grab.
ArchiveTeam/sourceforgedeveloperweb-items
Managing items for sourceforgedeveloperweb-grab.
ArchiveTeam/tistory-grab
Archiving Tistory.
ArchiveTeam/tistory-items
Managing items for tistory-grab.
ArchiveTeam/twitch-items
Managing items for twitch-grab.
ArchiveTeam/typepad-grab
Archiving Typepad.
ArchiveTeam/typepad-items
Managing items for typepad-grab.