scraping

There are 5742 repositories under scraping topic.

Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
Language:Python979
oj
Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.
Language:Python973
instagram-scraper
Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
Language:Python930
clean-text
🧹 Python package for text cleaning
Language:Python929
iiab
Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !
Language:Jinja886
crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Language:Elixir852
loconotion
📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Language:Python818
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Language:Python817
till
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
Language:Go809
Edu-Mail-Generator
Generate Free Edu Mail(s) within minutes
Language:Python795
twikit
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Language:Python788
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
Language:JavaScript771
easy-scraping-tutorial
Simple but useful Python web scraping tutorial code.
Language:Jupyter Notebook765
ImageScraper
:scissors: High performance, multi-threaded image scraper
Language:Python750
fingerprint-suite
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
Language:TypeScript749
gazpacho
🥫 The simple, fast, and modern web scraping library
Language:Python736
spider
The fastest web crawler written in Rust. Maintained by @a11ywatch.
Language:Rust694
linkedin
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Language:Python679
lookyloo
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
Language:Python660
PulsarRPA
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.
Language:Kotlin657
dataflowkit
Extract structured data from web sites. Web sites scraping.
Language:Go646
secret-agent
The web scraper that's nearly impossible to block - now called @ulixee/hero
Language:TypeScript645
websurfx
:rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, secure :lock: meta search engine
Language:Rust636
social-media-profiles-regexs
:card_index: Extract social media profiles and more with regular expressions
Language:Python594
newcrawler
Free Web Scraping Tool with Java
Language:JavaScript584
pdf.tocgen
A CLI toolset to generate table of contents for PDF files automatically.
Language:Python569
facebook_data_analyzer
Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more
Language:Ruby543
hrequests
🚀 Web scraping for humans
Language:Python542
dark-knowledge
😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.
Language:JavaScript534
google-search-results-python
Google Search Results via SERP API pip Python Package
Language:Python532
socialreaper
Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Language:Python531
comic-dl
Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.
Language:Python530
spidermon
Scrapy Extension for monitoring spiders execution.
Language:Python514
jekyll
Jekyll-based static site for The Programming Historian
Language:HTML509
Smartproxy
HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.
Language:C#505
nickjs
Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)
Language:JavaScript499