Scrapers

This is going to be a repository for random scrapers I've come to develop over the years. These scripts have been published in the intention of data preservation.

Description

All scrapers in this repository with a brief description

artic - Grabs art along with metadata from artic.edu
bensound - Grabs royalty music from bensound.com
betterttv - Grabs emotes from betterttv.tv
calibre-server - Grabs ebooks from a calibre-server
dafont - Grabs fonts uploaded to dafont.com
iconmonstr - Grabs icons in all formats from iconmonstr.com
ijsrd - Grabs papers published to ijsrd.com
ijssb - Grabs papers published to ijssb.com
impawards - Grabs all movie posters from impawards.com
linfoxdomain - Grabs all flash games from linfoxdomain.com
memoryoftheworld - Grabs all the books from memoryoftheworld.org
open3dlab - Grabs all assets and metadata from Open3DLab and Smutbase(NSFW)
oppetarkiv - Grabs programmes from oppetarkiv.se
riksdagen - Grabs debates and documents from riksdagen.se
shzm - Grabs the top 50 songs by city from shazam.com
thecoverproject - Grabs all the game posters from thecoverproject.net
theoatmeal - Grabs comics from theoatmeal.com
unicode-emoji-chart - Grabs emoji and metadata from unicode.org
vulnhub - Downloads items and all metadata on each item from vulnhub.com
wad-archive - Grabs WADs with metadata from wa-archive.com
wallhaven - Grabs all the images for a given search query from wallhaven.cc
wallpaperflare - Grabs all or some wallpapers with search from wallpaperflare.com
waset - Grabs papers published to waset.org
xkcd - Grabs comics posted to xkcd.com
zenpencil - Grabs all comics from zenpencils.com

Others

Scrapers in other repositories

acloud-dl by r0oth3x49 - A cross-platform python based utility to download courses from acloud.guru for personal offline use
allitebooks-downloader by thomasbrueggemann - 📚 Scrapes and downloads all IT eBooks http://allitebooks.com
ArchiveBot by ArchiveTeam - ArchiveBot, an IRC bot for archiving websites
archivebot-archives by nonPointer - This repository containes a list of files in the ArchiveBot Collection on the Internet Archive and the corresponding codes.
archivers by nektro - A collection of scripts to mass download data from various sites
ArchiveTools by recrm - A collection of tools for archiving and analysing the internet
Automated-ISO-ripping by pascaldulieu - Short bash script to automatically rip ISOs
awesome-dl by Kickball - This is a list of repositories and libraries that allow for scripted downloading of online content
BBCSoundDownloader by FThompson - Bulk downloader for http://bbcsfx.acropolis.org.uk/.
Bios-Archival-Standards by BiosPlus - A collection of gists and notes to help me [BiosPlus] standardize my archival efforts accross the board
comics-downloader by Girbons - Command-line tool to download comics and manga in pdf/epub/cbr/cbz from a website
Discord-Channel-scraper by simon987 - Scrapes an arbitrary number of lines from a Discord channel
download_scholar_pdfs by bozelosp - Batch .PDF downloading from a list of DOIs and/or titles. PDF files are retrieved/download from libgen scholar archives.
DownCloud by seru1us - Download SoundCloud tracks posted on targetted subreddits
e621.net-file-downloader by OllieTails - Scrapes images from e621.net (nfsw)
e621Crawler by fionera - Scrapes images from e621.net (nfsw)
flickr-search-scraper by AlexOwen - Script to capture a JSON file of all search results along with the best version of each image in the search
github-scraper - 🕷️ 🕸️ crawl GitHub web pages for insights we can't GET from the API... 💡
GOGGames_Crawler by fionera - Hacky Crawler for GoodOldDownloads
goodolddownloads-deobf by nabijaczleweli - Deobfuscate https://goggames.goodolddownloads.com/ links
grab-site by ArchiveTeam - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
humblebundle-downloader by diogogmt - Download book from Humble Bundles
instagram-scraper by rarcega - Scrapes an instagram user's photos and videos
InstaHoarder by beaston02 - Auto save instagram users stories and posts
lazynlp by chiphuyen - Library to scrape and clean web pages to create massive datasets
Lynx-Program-Downloader by spideyclick - This is a bash shell script that downloads programs from websites using Lynx.
Mega.nz-IDM-downloader by CHEF-KOCH - How to download from Mega.nz with IDM - Unlimited
Misc-Download-Scripts by simon987 - Miscellaneous download scripts
Music-Hoarders-Bot by JPBotelho - Discord bot written in python for the music hoarders server
PixivDownloader by nonPointer - Simple batch tool to download one's image from Pixiv
RedditImageBackup by LameLemon - Grabs all images, gifs, videos and text posts from Reddit
redditPostArchiver by pl77 - Easily archive important Reddit post threads onto your computer
scrapereplacementdocs by Itxaka - Scrapy spider to download pdfs from replacementdocs.com
Tidown by Transcodes - A simple, yet efficient, Tital downloader.
tpget by 0x6a73 - Tutorialspoint downloader
twitter-scraper by kennethreitz - Scrape the Twitter Frontend API without authentication.
uDownloader by usbpc - 1Fichier Folder Downloader written in Kotlin with download rate limit and stuff
wayback-machine-downloader by hartator - Download an entire website from the Wayback Machine

PeskyPotato/scrapers

Scrapers

Description

Others