Internet Archive
The Internet Archive is "the library of the Internet", and a big supporter of Free Software.
San Francisco
Pinned Repositories
bookreader
The Internet Archive BookReader
brozzler
brozzler - distributed browser-based web crawler
cicd
build & test using github registry; deploy to nomad clusters
heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
openlibrary
One webpage for every book ever published!
openlibrary-client
Python Client Library for the Archive.org OpenLibrary API
warcprox
WARC writing MITM HTTP/S proxy
wayback
IA's public Wayback Machine (moved from SourceForge)
wayback-machine-webextension
A web browser extension for Chrome, Firefox, Edge, and Safari 14.
Zeno
State-of-the-art web crawler 🔱
Internet Archive's Repositories
internetarchive/openlibrary
One webpage for every book ever published!
internetarchive/heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
internetarchive/bookreader
The Internet Archive BookReader
internetarchive/Zeno
State-of-the-art web crawler 🔱
internetarchive/internetarchivebot
internetarchive/archive-pdf-tools
Fast PDF generation and compression. Deals with millions of pages daily.
internetarchive/openlibrary-bots
A repository of cleanup bots implementing the openlibrary-client
internetarchive/iaux
Monorepo for Archive.org UX development and prototyping.
internetarchive/internet-archive-voice-apps
Voice Apps (Actions on Google, Alexa Skill) of Internet Archive. Just say: "Ok Google, Ask Internet Archive to Play Jazz" or "Alexa, Ask Internet Internet Archive to play Instrumental Music"
internetarchive/infogami
internetarchive/surt
Sort-friendly URI Reordering Transform (SURT) python module
internetarchive/gowarc
Read and write WARC files in Go
internetarchive/iiif
The official Internet Archive IIIF service
internetarchive/iari
Import workflows for the Wikipedia Citations Database
internetarchive/iaux-typescript-wc-template
IAUX Typescript WebComponent Template
internetarchive/iaux-collection-browser
internetarchive/esbuild_es5
minify JS/TS files using `esbuild` and `swc` down to ES5 (uses `deno`)
internetarchive/openlibrary-api
API documentation for https://github.com/internetarchive/openlibrary
internetarchive/iare
An interactive IARI JSON viewer
internetarchive/iaux-search-service
internetarchive/dyno
internetarchive/iaux-modal-manager
A Modal Manager WebComponent
internetarchive/iaux-histogram-date-range
Internet Archive histogram-date-range picker
internetarchive/iaux-monthly-giving-circle
internetarchive/iaux-notification-toast
displays notifications and automatically clears them
internetarchive/iaux-recaptcha-manager
internetarchive/tocky
[WIP] Extract structured table of contents data from digitized books
internetarchive/wbm_seed_stream
Google Summer of Code (GSoC) 2025 Wayback Machine Seed URL Classification and Prioritization project
internetarchive/bergamot-translator
Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
internetarchive/iaux-feature-feedback