scraping

There are 6215 repositories under scraping topic.

django-dynamic-scraper
Creating Scrapy scrapers via the Django admin interface
Language:Python1.2k
parsel
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Language:Python1.1k
DataEngineeringProject
Example end to end data engineering project.
Language:Python1.1k
querido-diario
📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.
Language:Python1.1k
Smartproxy
HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.
Language:Java1.1k
artoo
artoo.js - the client-side scraping companion.
Language:JavaScript1.1k
Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
Language:Python1.1k
oj
Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.
Language:Python1k
fingerprint-suite
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
Language:TypeScript992
crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Language:Elixir988
iiab
Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !
Language:Jinja968
clean-text
🧹 Python package for text cleaning
Language:Python957
instagram-scraper
Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
Language:Python937
parsera
Lightweight library for scraping web-sites with LLMs
Language:Python895
loconotion
📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Language:Python839
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
Language:Python816
till
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
Language:Go813
Edu-Mail-Generator
Generate Free Edu Mail(s) within minutes
Language:Python797
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
Language:JavaScript788
easy-scraping-tutorial
Simple but useful Python web scraping tutorial code.
Language:Jupyter Notebook787
linkedin
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Language:Python783
PulsarRPA
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
Language:Kotlin773
ImageScraper
:scissors: High performance, multi-threaded image scraper
Language:Python763
websurfx
:rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, secure :lock: meta search engine
Language:Rust758
gazpacho
🥫 The simple, fast, and modern web scraping library
Language:Python746
mov-cli
Watch everything from your terminal.
Language:Python735
hrequests
🚀 Web scraping for humans
Language:Python699
OF-Scraper
A completely revamped and redesigned fork, reimagined from scratch based on the original onlyfans-scraper
Language:Python692
camoufox
🦊 Anti-detect browser
Language:C++687
lookyloo
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
Language:Python684
secret-agent
The web scraper that's nearly impossible to block - now called @ulixee/hero
Language:TypeScript673
pdf.tocgen
A CLI toolset to generate table of contents for PDF files automatically.
Language:Python664
dataflowkit
Extract structured data from web sites. Web sites scraping.
Language:Go661
dark-knowledge
😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.
Language:JavaScript627
social-media-profiles-regexs
:card_index: Extract social media profiles and more with regular expressions
Language:Python608
google-search-results-python
Google Search Results via SERP API pip Python Package
Language:Python601