scraping
There are 5742 repositories under scraping topic.
Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
oj
Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.
instagram-scraper
Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
clean-text
๐งน Python package for text cleaning
iiab
Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !
crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
loconotion
๐ Python tool to turn Notion.so pages into lightweight, customizable static websites
Lulu
[Unmaintained] A simple and clean video/music/image downloader ๐พ
till
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
Edu-Mail-Generator
Generate Free Edu Mail(s) within minutes
twikit
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
easy-scraping-tutorial
Simple but useful Python web scraping tutorial code.
ImageScraper
:scissors: High performance, multi-threaded image scraper
fingerprint-suite
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
gazpacho
๐ฅซ The simple, fast, and modern web scraping library
spider
The fastest web crawler written in Rust. Maintained by @a11ywatch.
linkedin
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
lookyloo
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
PulsarRPA
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
dataflowkit
Extract structured data from web sites. Web sites scraping.
secret-agent
The web scraper that's nearly impossible to block - now called @ulixee/hero
websurfx
:rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, secure :lock: meta search engine
social-media-profiles-regexs
:card_index: Extract social media profiles and more with regular expressions
newcrawler
Free Web Scraping Tool with Java
pdf.tocgen
A CLI toolset to generate table of contents for PDF files automatically.
facebook_data_analyzer
Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more
hrequests
๐ Web scraping for humans
dark-knowledge
๐๐ A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.
google-search-results-python
Google Search Results via SERP API pip Python Package
socialreaper
Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
comic-dl
Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.
spidermon
Scrapy Extension for monitoring spiders execution.
jekyll
Jekyll-based static site for The Programming Historian
Smartproxy
HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.
nickjs
Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)