scraping
There are 6215 repositories under scraping topic.
django-dynamic-scraper
Creating Scrapy scrapers via the Django admin interface
parsel
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
DataEngineeringProject
Example end to end data engineering project.
querido-diario
📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.
Smartproxy
HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.
artoo
artoo.js - the client-side scraping companion.
Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
oj
Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.
fingerprint-suite
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
iiab
Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !
clean-text
🧹 Python package for text cleaning
instagram-scraper
Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.
parsera
Lightweight library for scraping web-sites with LLMs
loconotion
📄 Python tool to turn Notion.so pages into lightweight, customizable static websites
Lulu
[Unmaintained] A simple and clean video/music/image downloader 👾
till
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
Edu-Mail-Generator
Generate Free Edu Mail(s) within minutes
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
easy-scraping-tutorial
Simple but useful Python web scraping tutorial code.
linkedin
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
PulsarRPA
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
ImageScraper
:scissors: High performance, multi-threaded image scraper
websurfx
:rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, secure :lock: meta search engine
gazpacho
🥫 The simple, fast, and modern web scraping library
mov-cli
Watch everything from your terminal.
hrequests
🚀 Web scraping for humans
OF-Scraper
A completely revamped and redesigned fork, reimagined from scratch based on the original onlyfans-scraper
camoufox
🦊 Anti-detect browser
lookyloo
Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.
secret-agent
The web scraper that's nearly impossible to block - now called @ulixee/hero
pdf.tocgen
A CLI toolset to generate table of contents for PDF files automatically.
dataflowkit
Extract structured data from web sites. Web sites scraping.
dark-knowledge
😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.
social-media-profiles-regexs
:card_index: Extract social media profiles and more with regular expressions
google-search-results-python
Google Search Results via SERP API pip Python Package