scraper

There are 10201 repositories under scraper topic.

firecrawl/firecrawl
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
Language:TypeScript66.9k 257 7535.2k
huginn/huginn
Create agents that monitor and act on your behalf. Your agents are standing by!
Language:Ruby47.9k 739 2.2k4.2k
NaiboWang/EasySpider
A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。
Language:JavaScript43.3k 251 7395.3k
iawia002/lux
👾 Fast and simple video download library and CLI tool written in Go
Language:Go30.6k 384 1.1k3.2k
cheeriojs/cheerio
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
Language:TypeScript29.9k 345 1.2k1.7k
feder-cr/Jobs_Applier_AI_Agent_AIHawk
AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.
Language:Python29k 197 6734.4k
gocolly/colly
Elegant Scraper and Crawler Framework for Golang
Language:Go24.8k 322 5621.8k
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Language:TypeScript20.5k 123 1k1.1k
codelucas/newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
Language:HTML14.9k 373 6812.1k
Evil0ctal/Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。
Language:Python14.8k 96 5942.2k
getmaxun/maxun
⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡
Language:TypeScript13.8k 78 2741.1k
pwxcoo/chinese-xinhua
:orange_book: 中华新华字典数据库。包括歇后语，成语，词语，汉字。
Language:Python11.4k 306 622.6k
guyueyingmu/avbook
AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database
Language:PHP9.8k 345 1392k
TeamWiseFlow/wiseflow
Use LLMs to track and extract websites, RSS feeds, and social media
Language:Python7.9k 74 3291.4k
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Language:Python7.1k 38 474513
alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Language:Python7k 121 67713
BruceDone/awesome-crawler
A collection of awesome web crawler,spider in different languages
7k 201 19733
go-rod/rod
A Chrome DevTools Protocol driver for web automation and scraping.
Language:Go6.4k 49 977421
mishushakov/llm-scraper
Turn any webpage into structured data using LLMs
Language:TypeScript6.1k 36 37363
MontFerret/ferret
Declarative web scraping
Language:Go5.9k 92 300311
yujiosaka/headless-chrome-crawler
Distributed crawler powered by Headless Chrome
Language:JavaScript5.6k 113 135409
JustAnotherArchivist/snscrape
A social networking service scraper in Python
Language:Python5.2k 104 982769
niespodd/browser-fingerprinting
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?
Language:JavaScript4.9k 69 11267
fent/node-ytdl-core
YouTube video downloader in javascript.
Language:JavaScript4.7k 87 969866
myreader-io/myGPTReader
A community-driven way to read and chat with AI bots - powered by chatGPT.
Language:Python4.4k 48 35451
UltimaHoarder/UltimaScraper
Scrape all the media from an OnlyFans account - Updated regularly
Language:Python4.2k 179 1.5k617
IonicaBizau/scrape-it
🔮 A Node.js scraper for humans.
Language:JavaScript4.1k 59 116218
bjesus/pipet
Swiss-army tool for scraping and extracting data from online assets, made for hackers
Language:Go3.9k 16 2146
d60/twikit
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Language:Python3.7k 38 294439
JavScraper/Emby.Plugins.JavScraper
Emby/Jellyfin 的一个日本电影刮削器插件，可以从某些网站抓取影片信息。
Language:C#3.7k 49 331560
joeyism/linkedin_scraper
A library that scrapes Linkedin for user data
Language:Python3.5k 48 177816
aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE
Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
Language:Python3.3k 86 187570
meetDeveloper/freeDictionaryAPI
There was no free Dictionary API on the web when I wanted one for my friend, so I created one.
Language:JavaScript3.2k 37 192307
sqzw-x/mdcx
Movie metadata scraper
Language:Python3.1k 12 548406
edoardottt/cariddi
Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more
Language:Go2.8k 15 69256
geziyor/geziyor
Geziyor, blazing fast web crawling & scraping framework for Go. Supports JS rendering.
Language:Go2.8k 45 59156

scraper

firecrawl/firecrawl

huginn/huginn

NaiboWang/EasySpider

iawia002/lux

cheeriojs/cheerio

feder-cr/Jobs_Applier_AI_Agent_AIHawk

gocolly/colly

apify/crawlee

codelucas/newspaper

Evil0ctal/Douyin_TikTok_Download_API

getmaxun/maxun

pwxcoo/chinese-xinhua

guyueyingmu/avbook

TeamWiseFlow/wiseflow

apify/crawlee-python

alirezamika/autoscraper

BruceDone/awesome-crawler

go-rod/rod

mishushakov/llm-scraper

MontFerret/ferret

yujiosaka/headless-chrome-crawler

JustAnotherArchivist/snscrape

niespodd/browser-fingerprinting

fent/node-ytdl-core

myreader-io/myGPTReader

UltimaHoarder/UltimaScraper

IonicaBizau/scrape-it

bjesus/pipet

d60/twikit

JavScraper/Emby.Plugins.JavScraper

joeyism/linkedin_scraper

aapatre/Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

meetDeveloper/freeDictionaryAPI

sqzw-x/mdcx

edoardottt/cariddi

geziyor/geziyor