scraping

There are 5742 repositories under scraping topic.

  • Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

    Language:Python979
  • oj

    Tools for various online judges. Downloading sample cases, generating additional test cases, testing your code, and submitting it.

    Language:Python973
  • instagram-scraper

    Scrape the Instagram frontend. Inspired from twitter-scraper by @kennethreitz.

    Language:Python930
  • clean-text

    ๐Ÿงน Python package for text cleaning

    Language:Python929
  • iiab

    Internet-in-a-Box - Build your own LIBRARY OF ALEXANDRIA with a Raspberry Pi !

    Language:Jinja886
  • crawly

    Crawly, a high-level web crawling & scraping framework for Elixir.

    Language:Elixir852
  • loconotion

    ๐Ÿ“„ Python tool to turn Notion.so pages into lightweight, customizable static websites

    Language:Python818
  • Lulu

    [Unmaintained] A simple and clean video/music/image downloader ๐Ÿ‘พ

    Language:Python817
  • till

    DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

    Language:Go809
  • Edu-Mail-Generator

    Generate Free Edu Mail(s) within minutes

    Language:Python795
  • twikit

    twikit

    Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

    Language:Python788
  • kuwala

    kuwala

    Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

    Language:JavaScript771
  • easy-scraping-tutorial

    Simple but useful Python web scraping tutorial code.

    Language:Jupyter Notebook765
  • ImageScraper

    :scissors: High performance, multi-threaded image scraper

    Language:Python750
  • fingerprint-suite

    Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

    Language:TypeScript749
  • gazpacho

    ๐Ÿฅซ The simple, fast, and modern web scraping library

    Language:Python736
  • spider

    The fastest web crawler written in Rust. Maintained by @a11ywatch.

    Language:Rust694
  • linkedin

    Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

    Language:Python679
  • lookyloo

    Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

    Language:Python660
  • PulsarRPA

    Automate webpages at scale, scrape web data completely and accurately with high performance, distributed RPA.

    Language:Kotlin657
  • dataflowkit

    Extract structured data from web sites. Web sites scraping.

    Language:Go646
  • secret-agent

    The web scraper that's nearly impossible to block - now called @ulixee/hero

    Language:TypeScript645
  • websurfx

    :rocket: An open source alternative to searx which provides a modern-looking :sparkles:, lightning-fast :zap:, privacy respecting :disguised_face:, secure :lock: meta search engine

    Language:Rust636
  • social-media-profiles-regexs

    :card_index: Extract social media profiles and more with regular expressions

    Language:Python594
  • newcrawler

    Free Web Scraping Tool with Java

    Language:JavaScript584
  • pdf.tocgen

    A CLI toolset to generate table of contents for PDF files automatically.

    Language:Python569
  • facebook_data_analyzer

    Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more

    Language:Ruby543
  • hrequests

    ๐Ÿš€ Web scraping for humans

    Language:Python542
  • dark-knowledge

    ๐Ÿ˜ˆ๐Ÿ“š A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.

    Language:JavaScript534
  • google-search-results-python

    Google Search Results via SERP API pip Python Package

    Language:Python532
  • socialreaper

    Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

    Language:Python531
  • comic-dl

    Comic-dl is a command line tool to download manga and comics from various comic and manga sites. Supported sites : readcomiconline.to, mangafox.me, comic naver and many more.

    Language:Python530
  • spidermon

    Scrapy Extension for monitoring spiders execution.

    Language:Python514
  • jekyll

    jekyll

    Jekyll-based static site for The Programming Historian

    Language:HTML509
  • Smartproxy

    HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.

    Language:C#505
  • nickjs

    Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)

    Language:JavaScript499