This is going to be a repository for random scrapers I've come to develop over the years. These scripts have been published in the intention of data preservation.
All scrapers in this repository with a brief description
- artic - Grabs art along with metadata from artic.edu
- bensound - Grabs royalty music from bensound.com
- betterttv - Grabs emotes from betterttv.tv
- calibre-server - Grabs ebooks from a calibre-server
- dafont - Grabs fonts uploaded to dafont.com
- iconmonstr - Grabs icons in all formats from iconmonstr.com
- ijsrd - Grabs papers published to ijsrd.com
- ijssb - Grabs papers published to ijssb.com
- impawards - Grabs all movie posters from impawards.com
- linfoxdomain - Grabs all flash games from linfoxdomain.com
- memoryoftheworld - Grabs all the books from memoryoftheworld.org
- open3dlab - Grabs all assets and metadata from Open3DLab and Smutbase(NSFW)
- oppetarkiv - Grabs programmes from oppetarkiv.se
- riksdagen - Grabs debates and documents from riksdagen.se
- shzm - Grabs the top 50 songs by city from shazam.com
- thecoverproject - Grabs all the game posters from thecoverproject.net
- theoatmeal - Grabs comics from theoatmeal.com
- unicode-emoji-chart - Grabs emoji and metadata from unicode.org
- vulnhub - Downloads items and all metadata on each item from vulnhub.com
- wad-archive - Grabs WADs with metadata from wa-archive.com
- wallhaven - Grabs all the images for a given search query from wallhaven.cc
- wallpaperflare - Grabs all or some wallpapers with search from wallpaperflare.com
- waset - Grabs papers published to waset.org
- xkcd - Grabs comics posted to xkcd.com
- zenpencil - Grabs all comics from zenpencils.com
Scrapers in other repositories
-
acloud-dl by r0oth3x49 - A cross-platform python based utility to download courses from acloud.guru for personal offline use
-
allitebooks-downloader by thomasbrueggemann - 📚 Scrapes and downloads all IT eBooks http://allitebooks.com
-
ArchiveBot by ArchiveTeam - ArchiveBot, an IRC bot for archiving websites
-
archivebot-archives by nonPointer - This repository containes a list of files in the ArchiveBot Collection on the Internet Archive and the corresponding codes.
-
archivers by nektro - A collection of scripts to mass download data from various sites
-
ArchiveTools by recrm - A collection of tools for archiving and analysing the internet
-
Automated-ISO-ripping by pascaldulieu - Short bash script to automatically rip ISOs
-
awesome-dl by Kickball - This is a list of repositories and libraries that allow for scripted downloading of online content
-
BBCSoundDownloader by FThompson - Bulk downloader for http://bbcsfx.acropolis.org.uk/.
-
Bios-Archival-Standards by BiosPlus - A collection of gists and notes to help me [BiosPlus] standardize my archival efforts accross the board
-
comics-downloader by Girbons - Command-line tool to download comics and manga in pdf/epub/cbr/cbz from a website
-
Discord-Channel-scraper by simon987 - Scrapes an arbitrary number of lines from a Discord channel
-
download_scholar_pdfs by bozelosp - Batch .PDF downloading from a list of DOIs and/or titles. PDF files are retrieved/download from libgen scholar archives.
-
DownCloud by seru1us - Download SoundCloud tracks posted on targetted subreddits
-
e621.net-file-downloader by OllieTails - Scrapes images from e621.net (nfsw)
-
e621Crawler by fionera - Scrapes images from e621.net (nfsw)
-
flickr-search-scraper by AlexOwen - Script to capture a JSON file of all search results along with the best version of each image in the search
-
github-scraper - 🕷️ 🕸️ crawl GitHub web pages for insights we can't GET from the API... 💡
-
GOGGames_Crawler by fionera - Hacky Crawler for GoodOldDownloads
-
goodolddownloads-deobf by nabijaczleweli - Deobfuscate https://goggames.goodolddownloads.com/ links
-
grab-site by ArchiveTeam - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
-
humblebundle-downloader by diogogmt - Download book from Humble Bundles
-
instagram-scraper by rarcega - Scrapes an instagram user's photos and videos
-
InstaHoarder by beaston02 - Auto save instagram users stories and posts
-
lazynlp by chiphuyen - Library to scrape and clean web pages to create massive datasets
-
Lynx-Program-Downloader by spideyclick - This is a bash shell script that downloads programs from websites using Lynx.
-
Mega.nz-IDM-downloader by CHEF-KOCH - How to download from Mega.nz with IDM - Unlimited
-
Misc-Download-Scripts by simon987 - Miscellaneous download scripts
-
Music-Hoarders-Bot by JPBotelho - Discord bot written in python for the music hoarders server
-
PixivDownloader by nonPointer - Simple batch tool to download one's image from Pixiv
-
RedditImageBackup by LameLemon - Grabs all images, gifs, videos and text posts from Reddit
-
redditPostArchiver by pl77 - Easily archive important Reddit post threads onto your computer
-
scrapereplacementdocs by Itxaka - Scrapy spider to download pdfs from replacementdocs.com
-
Tidown by Transcodes - A simple, yet efficient, Tital downloader.
-
tpget by 0x6a73 - Tutorialspoint downloader
-
twitter-scraper by kennethreitz - Scrape the Twitter Frontend API without authentication.
-
uDownloader by usbpc - 1Fichier Folder Downloader written in Kotlin with download rate limit and stuff
-
wayback-machine-downloader by hartator - Download an entire website from the Wayback Machine