ruebot's Stars
jdx/mise
dev tools, env vars, task runner
end-of-term/eot-parquet-workshop
Parquet workshop using the End of Term Web Archive
unt-libraries/etd-to-urls
A project for extracting URLs from Electronic Theses and Dissertations in PDF format in order to generate web archives of the referenced URLs.
harvard-lil/warc-gpt
WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
crissyfield/troll-a
Drill into WARC web archives
Mafoelffen1/OpenZFS-Ubuntu-Admin
ZFS Systems Admin & Other ZFS Support Tutorials
commoncrawl/cc-crawl-statistics
Statistics of Common Crawl monthly archives mined from URL index files
edsu/memento-cli
A command line utility for listing and searching snapshots in web archives
commoncrawl/cc-index-table
Index Common Crawl archives in tabular format
dense-analysis/ale
Check syntax in Vim/Neovim asynchronously and fix files, with Language Server Protocol (LSP) support
unt-libraries/untl-digital-collections-handbook
UNT Libraries Digital Collections Handbook
Fudge/infowars
Transcripts of the Alex Jones Show
fhamborg/news-please
news-please - an integrated web crawler and information extractor for news that just works
vim-pandoc/vim-pandoc-syntax
pandoc markdown syntax, to be installed alongside vim-pandoc
pedrozath/coltrane
🎹🎸A music theory library with a command-line interface
nlnwa/warchaeology
Command line tool for digging into WARC files
nativesintech/indigemoji
Indigenous emoji for your (Slack | Discord | Discourse)
nativesintech/endasfmascotry
A call to end Apache® Software Foundation's appropriation of Apache culture
markus-perl/ffmpeg-build-script
The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.
wragge/gpt3-experiments
catalyst/cca_taxonomy_manager
Experimental views-based taxonomy manager including search, merge, and move terms
pola-rs/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
ross-spencer/sumfolder1
What is the checksum of a directory?
RedSiege/EyeWitness
EyeWitness is designed to take screenshots of websites, provide some server header info, and identify default credentials if possible.
kristianperkins/x_x
View Excel and CSV files from the command-line.
ArchiveTeam/grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
nektos/act
Run your GitHub Actions locally 🚀
JustAnotherArchivist/snscrape
A social networking service scraper in Python
digitalutsc/arks-service-playbook
An Ansible playbook for the Ark Services.