scrape

There are 610 repositories under scrape topic.

  • twintproject/twint

    An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

    Language:Python16.3k3291.2k2.8k
  • autoscraper

    alirezamika/autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    Language:Python7k12167713
  • twikit

    d60/twikit

    Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

    Language:Python3.7k38294439
  • Anorov/cloudflare-scrape

    A Python module to bypass Cloudflare's anti-bot page.

    Language:Python3.5k125397452
  • metascraper

    microlinkhq/metascraper

    Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.

    Language:HTML2.6k15253182
  • any4ai/AnyCrawl

    AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

    Language:TypeScript2.4k1525237
  • trevorhobenshield/twitter-api-client

    Implementation of X/Twitter v1, v2, and GraphQL APIs

    Language:Python1.9k28239249
  • Altimis/Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

    Language:Python1.2k17151245
  • glebarez/cero

    Scrape domain names from SSL certificates of arbitrary hosts

    Language:Go6868991
  • markowanga/stweet

    Advanced python library to scrap Twitter (tweets, users) from unofficial API

    Language:Python614135669
  • austinoboyle/scrape-linkedin-selenium

    `scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

    Language:HTML5082590167
  • drudge/n8n-nodes-puppeteer

    n8n node for browser automation using Puppeteer

    Language:TypeScript43544971
  • unixfox/pupflare

    A webpage proxy that request through Chromium (puppeteer) - can be used to bypass Cloudflare anti bot / anti ddos on any application (like curl)

    Language:JavaScript420143282
  • instamancer

    ScriptSmith/instamancer

    Scrape Instagram's API with Puppeteer

    Language:TypeScript408203761
  • danieldotnl/ha-multiscrape

    Home Assistant custom component for scraping (html, xml or json) multiple values (from a single HTTP request) with a separate sensor/attribute for each value. Support for (login) form-submit functionality.

    Language:Python378717718
  • yaroslaff/nudecrawler

    Crawl telegra.ph searching for nudes!

    Language:Python3418726
  • Anonyfox/elixir-scrape

    Scrape any website, article or RSS/Atom Feed with ease!

    Language:Elixir332152041
  • ultralytics/google-images-download

    Google/Bing Images Web Downloader

    Language:Python31441392
  • andrewstuart/goq

    A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

    Language:Go26891221
  • ransomware.live

    JMousqueton/ransomware.live

    🏴‍☠️💰 Another Ransomware gang tracker

    Language:Python2651015859
  • JaredLGillespie/proxyscrape

    Python library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).

    Language:Python261171655
  • evyatarmeged/Humanoid

    Node.js package to bypass CloudFlare's anti-bot JavaScript challenges

    Language:JavaScript23431127
  • essamamdani/search-result-scraper-markdown

    This project provides a powerful web scraping tool that fetches search results and converts them into Markdown format using FastAPI, SearXNG, and Browserless. It includes the capability to use proxies for web scraping and handles HTML content conversion to Markdown efficiently.

    Language:Python2261113
  • rocketlaunchr/google-search

    scrape google search results

    Language:Go18031233
  • oxylabs/scrape-google-python

    In this tutorial, we showcase how to scrape public Google data with Python and Oxylabs API.

  • tegridydev/auto-md

    Convert Files / Folders / GitHub Repos Into AI / LLM-ready Files

    Language:Python16210125
  • Jimut123/jimutmap

    API to get enormous amount of high resolution satellite images from satellites.pro quickly through multi-threading! create map your own map dataset. Bringing data to Humans.

    Language:Python15551218
  • meetyan/raise

    A simple (and unofficial) GitHub Trending client that lives in your menubar.

    Language:JavaScript148201
  • html2rss

    html2rss/html2rss

    📰 Build RSS 2.0 feeds from websites (and JSON APIs) automatically or with a few CSS selectors.

    Language:Ruby13435410
  • luengwaiban/instagram-python-scraper

    A instagram scraper wrote in python. Similar to instagram-php-scraper.Usages are in example.py. Enjoy it!

    Language:Python1317512
  • DrKain/scrape-youtube

    A lightning fast package to scrape YouTube search results

    Language:JavaScript12175031
  • fefit/visdom

    A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.

    Language:Rust1132237
  • badoux/goscraper

    Golang pkg to quickly return a preview of a webpage (title/description/images)

    Language:Go1101442
  • jgravelle/groqcrawl

    GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consumable content from websites, with support for single-page scraping, multi-page crawling, and site mapping.

    Language:Python992130
  • ndgigliotti/shopify-spy

    Extract structured data from Shopify websites.

    Language:Python994350
  • SilentDemonSD/FZBypassBot

    A Elegant Fast Multi Threaded Bypass Bot for Bigger Deeds. Try Now !!

    Language:Python96210166