scraping

There are 6215 repositories under scraping topic.

  • django-dynamic-scraper

    Creating Scrapy scrapers via the Django admin interface

    Language:Python1.2k
  • parsel

    Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

    Language:Python1.1k
  • DataEngineeringProject

    Example end to end data engineering project.

    Language:Python1.1k
  • querido-diario

    querido-diario

    📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

    Language:Python1.1k
  • Smartproxy

    HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.

    Language:Java1.1k
  • artoo

    artoo.js - the client-side scraping companion.

    Language:JavaScript1.1k
  • Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

    Language:Python1.1k
  • oj

    Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.

    Language:Python1k
  • fingerprint-suite

    Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

    Language:TypeScript992
  • crawly

    Crawly, a high-level web crawling & scraping framework for Elixir.

    Language:Elixir988
  • iiab

    Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !

    Language:Jinja968
  • clean-text

    🧹 Python package for text cleaning

    Language:Python957
  • instagram-scraper

    Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.

    Language:Python937
  • parsera

    Lightweight library for scraping web-sites with LLMs

    Language:Python895
  • loconotion

    📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

    Language:Python839
  • Lulu

    [Unmaintained] A simple and clean video/music/image downloader 👾

    Language:Python816
  • till

    DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

    Language:Go813
  • Edu-Mail-Generator

    Generate Free Edu Mail(s) within minutes

    Language:Python797
  • kuwala

    kuwala

    Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

    Language:JavaScript788
  • easy-scraping-tutorial

    Simple but useful Python web scraping tutorial code.

    Language:Jupyter Notebook787
  • linkedin

    Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

    Language:Python783
  • PulsarRPA

    Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.

    Language:Kotlin773
  • ImageScraper

    :scissors: High performance, multi-threaded image scraper

    Language:Python763
  • websurfx

    :rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, secure :lock: meta search engine

    Language:Rust758
  • gazpacho

    🥫 The simple, fast, and modern web scraping library

    Language:Python746
  • mov-cli

    mov-cli

    Watch everything from your terminal.

    Language:Python735
  • hrequests

    🚀 Web scraping for humans

    Language:Python699
  • OF-Scraper

    A completely revamped and redesigned fork, reimagined from scratch based on the original onlyfans-scraper

    Language:Python692
  • camoufox

    🦊 Anti-detect browser

    Language:C++687
  • lookyloo

    Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

    Language:Python684
  • secret-agent

    The web scraper that's nearly impossible to block - now called @ulixee/hero

    Language:TypeScript673
  • pdf.tocgen

    A CLI toolset to generate table of contents for PDF files automatically.

    Language:Python664
  • dataflowkit

    Extract structured data from web sites. Web sites scraping.

    Language:Go661
  • dark-knowledge

    😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.

    Language:JavaScript627
  • social-media-profiles-regexs

    :card_index: Extract social media profiles and more with regular expressions

    Language:Python608
  • google-search-results-python

    Google Search Results via SERP API pip Python Package

    Language:Python601