webscraping

There are 9841 repositories under webscraping topic.

firecrawl/firecrawl
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Language:TypeScript67k 257 7535.2k
huginn/huginn
Create agents that monitor and act on your behalf. Your agents are standing by!
Language:Ruby47.9k 739 2.2k4.2k
assafelovic/gpt-researcher
An LLM agent that conducts deep research (local and web) on any given topic and generates a long report with citations.
Language:Python24.1k 170 6173.2k
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
Language:Python21.7k 135 4151.9k
getmaxun/maxun
⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡
Language:TypeScript13.8k 78 2741.1k
pystardust/ani-cli
A cli tool to browse and play anime
Language:Shell10.1k 59 833644
D4Vinci/Scrapling
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Language:Python8.1k 50 44463
lorien/awesome-web-scraping
List of libraries, tools and APIs for web scraping and data processing.
Language:Makefile7.4k 232 10827
alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Language:Python7k 121 67713
niespodd/browser-fingerprinting
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?
Language:JavaScript4.9k 69 11267
jaypyles/Scraperr
Self-hosted webscraper.
Language:TypeScript4.7k 9 53239
daijro/camoufox
🦊 Anti-detect browser
Language:C++4k 54 281350
scrapoxy/scrapoxy
Scrapoxy is a super proxies manager that orchestrates all your proxies into one place, rather than spreading management across multiple scrapers. It manages IP rotation and fingerprinting, and smartly routes traffic to avoid bans.
Language:TypeScript2.4k 54 0262
anaskhan96/soup
Web Scraper in Go, similar to BeautifulSoup
Language:Go2.2k 34 44169
itsOwen/CyberScraper-2077
A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama
Language:Python1.9k 11 29176
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.
Language:JavaScript1.8k 28 10768
reworkd/tarsier
Vision utilities for web interaction agents 👀
Language:Jupyter Notebook1.7k 12 20114
TheWebScrapingClub/webscraping-from-0-to-hero
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
1.7k 32 0105
requests-cache/requests-cache
Persistent HTTP cache for python requests
Language:Python1.5k 17 455157
jamesturk/scrapeghost
👻 Experimental library for scraping websites using OpenAI's GPT API.
Language:Python1.4k 17 087
m8sec/CrossLinked
LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping
Language:Python1.4k 27 18211
raznem/parsera
Lightweight library for scraping web-sites with LLMs
Language:Python1.2k 19 1769
holgerd77/django-dynamic-scraper
Creating Scrapy scrapers via the Django admin interface
Language:Python1.2k 73 98307
mov-cli/mov-cli
Watch everything from your terminal.
Language:Python1k 11 24353
GodsScion/Auto_job_applier_linkedIn
Make your job hunt easy by automating your application process with this Auto Applier
Language:Python1k 17 33288
Kaliiiiiiiiii-Vinyzu/patchright-python
Undetected Python version of the Playwright testing and automation library.
Language:Python945 17 6467
cdpdriver/zendriver
A blazing fast, async-first, undetectable webscraping/web automation framework based on ultrafunkamsterdam/nodriver. Now with Docker support!
Language:Python862 22 15065
benibela/xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
Language:Pascal817 27 11946
Skallwar/suckit
Suck the InTernet
Language:Rust791 7 8343
maxhumber/gazpacho
🥫 The simple, fast, and modern web scraping library
Language:Python771 17 4956
scrapfly/scrapfly-scrapers
Scalable Python web scraping scripts for +40 popular domains
Language:Python747 15 22161
z0m31en7/Uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
Language:Python718 7 1180
wodsuz/EasyApplyJobsBot
A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!
Language:Python691 16 54205
chris-greening/instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Language:Python660 25 82115
openzim/zimit
Make a ZIM file from any Web site and surf offline!
Language:Python638 19 38943
vil/H4X-Tools
Open source toolkit for scraping, OSINT and more.
Language:Python595 11 2582

webscraping

firecrawl/firecrawl

huginn/huginn

assafelovic/gpt-researcher

ScrapeGraphAI/Scrapegraph-ai

getmaxun/maxun

pystardust/ani-cli

D4Vinci/Scrapling

lorien/awesome-web-scraping

alirezamika/autoscraper

niespodd/browser-fingerprinting

jaypyles/Scraperr

daijro/camoufox

scrapoxy/scrapoxy

anaskhan96/soup

itsOwen/CyberScraper-2077

Kaliiiiiiiiii-Vinyzu/patchright

reworkd/tarsier

TheWebScrapingClub/webscraping-from-0-to-hero

requests-cache/requests-cache

jamesturk/scrapeghost

m8sec/CrossLinked

raznem/parsera

holgerd77/django-dynamic-scraper

mov-cli/mov-cli

GodsScion/Auto_job_applier_linkedIn

Kaliiiiiiiiii-Vinyzu/patchright-python

cdpdriver/zendriver

benibela/xidel

Skallwar/suckit

maxhumber/gazpacho

scrapfly/scrapfly-scrapers

z0m31en7/Uscrapper

wodsuz/EasyApplyJobsBot

chris-greening/instascrape

openzim/zimit

vil/H4X-Tools