webcrawler

There are 889 repositories under webcrawler topic.

crawlab-team/crawlab
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架
Language:Go11.4k 214 9331.8k
ssssssss-team/spider-flow
新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。
Language:Java9.6k 95 431.9k
GeneralNewsExtractor/GeneralNewsExtractor
新闻网页正文通用抽取器 Beta 版.
Language:Python3.6k 86 82529
zorlan/skycaiji
蓝天采集器是一款开源免费的爬虫系统，仅需点选编辑规则即可采集数据，可运行在本地、虚拟主机或云服务器中，几乎能采集所有类型的网页，无缝对接各类CMS建站程序，免登录实时发布数据，全自动无需人工干预！是网页大数据采集软件中完全跨平台的云端爬虫系统
Language:PHP1.9k 78 43590
amirgamil/apollo
A Unix-style personal search engine and web crawler for your digital footprint.
Language:Go1.4k 17 1051
scrapinghub/scrapyrt
HTTP API for Scrapy spiders
Language:Python836 45 95162
3nock/SpiderSuite
Advance web security spider/crawler
610 10 766
z0m31en7/Uscrapper
Uscrapper Vanta: Dive deeper into the web with this powerful open-source tool. Extract valuable insights with ease and efficiency, from both surface and deep web sources. Empower your data mining and analysis with Vanta's advanced capabilities. Fast, reliable, and user-friendly, Uscrapper Vanta is the ultimate choice for researchers and analysts.
Language:Python513 5 652
jaeksoft/opensearchserver
Open-source Enterprise Grade Search Engine Software
Language:Java500 77 551190
kingname/SourceCodeOfBook
《Python爬虫开发从入门到实战》配套源代码。
Language:Python353 10 4123
salimk/Rcrawler
An R web crawler and scraper
Language:R350 40 7592
adrianosferreira/afrodite.json
O maior livro de receitas culinárias em língua portuguesa
188 4 243
mehmetozkaya/DotnetCrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Language:C#175 12 365
sushant10/HQ_Bot
📲 Bot to help solve HQ trivia
Language:Python173 35 16292
codeudan/crawler-china-mainland-universities
**大陆大学列表爬虫
Language:JavaScript168 7 450
DedSecInside/gotor
This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.
Language:Go159 7 2744
hedii/php-crawler
A php crawler that finds emails on the internets
Language:PHP134 23 3365
brianmadden/krawler
A web crawling framework written in Kotlin
Language:Kotlin127 7 1516
voliveirajr/seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Language:Python126 10 046
topiccrawler/jkcrawler
使用 Scrapy 写成的 JK 爬虫，图片源自哔哩哔哩、Tumblr、Instagram，以及微博、Twitter
Language:Python116 4 528
pavlovtech/WebReaper
Web scraper, crawler and parser in C#. Designed as simple, declarative and scalable web scraping solution.
Language:C#111 6 1126
52ai/Crawler4Caida
Stick to doing something interesting and valuable.
98 8 027
makuto/Liked-Saved-Image-Downloader
Save content you enjoy!
Language:Python89 8 8010
Aavache/LLMWebCrawler
A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.
Language:Python76 1 09
Sarthakjain1206/Intelligent_Document_Finder
Document Search Engine Tool
Language:Python71 5 314
shenxiangzhuang/PythonDataAnalysis
The data and code that used in my book.
Language:Jupyter Notebook69 8 1646
realdennis/igcloud
*UNSUPPORTED* Use igcloud to generate Instagram Word Cloud ! 🛫 🛫 ✈ 🔝
Language:Python66 3 211
k4yt3x/konadl
Multithreaded Konachan / Yandere (moebooru based site) Image Bulk Downloader | 多线程K站Y站下载器
Language:Python63 5 316
hysios/coronavirus
2019 nCoV realtime track system based Scrapy + influxdb + grafana + NLTK + Stanford CoreNLP
Language:Python61 2 28
Aravindha1234u/SocialScraper
Social Scraper is a python tool meant for Detection of Child Predators/Cyber Harassers on Social Media
Language:Python57 4 513
hfreire/browser-as-a-service
A web browser :earth_americas: hosted as a service, to render your JavaScript web pages as HTML
Language:JavaScript52 6 612
robsonbittencourt/gafanhoto
Bot para monitoramento de promoções no fórum do Hardmob http://www.hardmob.com.br/promocoes/
Language:Java52 7 17
BitTigerInst/Pikachu
Yummy Recipe Crawler and Search
Language:JavaScript51 39 018
DeuxHuitHuit/algolia-webcrawler
Simple node worker that crawls sitemaps in order to keep an algolia index up-to-date
Language:JavaScript46 9 2818
Conso1eCowb0y/Deepminer
Deep web crawler and search engine
Language:Python45 4 211
opencharles/charles
Java web crawling library
Language:Java32 6 619

webcrawler

crawlab-team/crawlab

ssssssss-team/spider-flow

GeneralNewsExtractor/GeneralNewsExtractor

zorlan/skycaiji

amirgamil/apollo

scrapinghub/scrapyrt

3nock/SpiderSuite

z0m31en7/Uscrapper

jaeksoft/opensearchserver

kingname/SourceCodeOfBook

salimk/Rcrawler

adrianosferreira/afrodite.json

mehmetozkaya/DotnetCrawler

sushant10/HQ_Bot

codeudan/crawler-china-mainland-universities

DedSecInside/gotor

hedii/php-crawler

brianmadden/krawler

voliveirajr/seleniumcrawler

topiccrawler/jkcrawler

pavlovtech/WebReaper

52ai/Crawler4Caida

makuto/Liked-Saved-Image-Downloader

Aavache/LLMWebCrawler

Sarthakjain1206/Intelligent_Document_Finder

shenxiangzhuang/PythonDataAnalysis

realdennis/igcloud

k4yt3x/konadl

hysios/coronavirus

Aravindha1234u/SocialScraper

hfreire/browser-as-a-service

robsonbittencourt/gafanhoto

BitTigerInst/Pikachu

DeuxHuitHuit/algolia-webcrawler

Conso1eCowb0y/Deepminer

opencharles/charles