crawlers

There are 156 repositories under crawlers topic.

ai-robots-txt/ai.robots.txt
A list of AI agents and robots to block.
Language:Python1.3k 20 1747
omrilotan/isbot
🤖/👨‍🦰 Detect bots/crawlers/spiders using the user agent string
Language:TypeScript960 12 9179
flathunters/flathunter
A bot to help people with their rental real-estate search. 🏠🤖
Language:HTML861 16 191183
salimk/Rcrawler
An R web crawler and scraper
Language:R350 40 7592
StJudeWasHere/seonaut
Open source SEO auditing tool.
Language:Go270 5 2544
Norconex/crawlers
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
Language:Java184 33 82667
ArchiveTeam/wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Language:C107 20 1815
narkhedesam/Proxy-List-Scrapper
Proxy List Scrapper
Language:Python99 3 719
jonasjacek/robots.txt
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
87 7 038
behitek/social-scraper
Vietnamese text data crawler scripts for various sites (including Youtube, Facebook, 4rum, news, ...)
Language:Python75 5 045
howie6879/hproxy
hproxy - Asynchronous IP proxy pool, aims to make getting proxy as convenient as possible.(异步爬虫代理池)
Language:Python66 5 314
Potelo/laravel-block-bots
Block crawlers and high traffic users on your site by IP using Redis
Language:PHP46 5 817
joaopauloaramuni/python
Repo Python
Language:Python45 1 00
Symbolexe/Raven
Raven is a powerful and customizable web crawler written in Go.
Language:Go41 1 07
BaseMax/GooglePlayWebServiceAPI
Tiny script to crawl information of a specific application in the Google play/store base on PHP.
Language:PHP37 6 79
flulemon/sneakpeek
Sneakpeek is a framework that helps to quickly and conviniently develop scrapers. It’s the best choice for scrapers that have some specific complex scraping logic that needs to be run on a constant basis
Language:Python37 2 00
peterbencze/serritor
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.
Language:Java31 3 1915
herrbischoff/user-agents
User agent database in JSON format of bots, crawlers, certain malware, automated software, scripts and uncommon ones.
Language:Shell28 5 05
zcrawl/zcrawl
An open source web crawling platform
Language:Go22 4 04
p0dalirius/crawlersuseragents
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.
Language:Python20 3 03
robertciotoiu/mobile-de-car-data-collector
Crawl, scrape and persist Mobile.de car listings data in a smart & responsible way
Language:Java19 3 03
anapaulagomes/licitacoes-de-feira
Licitações de Feira de Santana de fácil acesso aos cidadãos 🏦
Language:Python18 2 04
delvelabs/htcap
htcap is a web application scanner able to crawl single page application (SPA) in a recursive manner by intercepting ajax calls and DOM changes.
Language:Python18 11 34
ElektroStudios/Google-Search-URL-Crawler
Desktop app that crawls urls from Google's search engine results
Language:Visual Basic .NET16 2 02
shaoxiongdu/SkyEye
一个基于SpringBoot的全网热点爬虫项目，原始热搜数据会入库，分词统计会存入Redis。方便之后的数据分析。
Language:Java15 1 06
arquejadalucy/jus_crawler
API que busca dados de um processo em todos os graus dos Tribunais de Justiça de Alagoas (TJAL) e do Ceará (TJCE).
Language:Python14 2 04
BryanMorgan/isbot
Rust library to detect bots using a user-agent string
Language:Rust13 2 01
Hsins/Daily-GitHub-Trending
📰 Fetch daily trending repositories information on GitHub Trending Page by script writen in JavaScript and executed with GitHub Actions Service.
Language:JavaScript13 4 13
solidusio-contrib/solidus_sitemap
Provide a sitemap of your Solidus store.
Language:Ruby13 14 1527
tranlv/wiki-link
Scraping the wiki pages and find the minimum number of links between two wiki pages
Language:Python11 1 124
BaseMax/StackoverflowCrawler
A web crawler which crawls the stackoverflow website.
Language:Python10 3 0
versioneye/crawl_r
VersionEye crawlers implemented in Ruby.
Language:Roff10 5 266
romis2012/is-bot
Detect bots/crawlers/spiders via user-agent string
Language:Python9 3 00
acidus99/Kennedy
Kennedy: Crawler and Search Engine for Gemini space. Leverages techniques and architecture from early WWW crawlers like Mercator, Archive.org, and GoogleBot
Language:C#8 1 00
arthur3486/born2crawl
A highly performant and versatile crawling engine, designed with scalability and extensibility in mind.
Language:Kotlin8 1 40
FEZIRO/wechat-miniprogram-spider-demo
微信小程序云开发网络爬虫教程
Language:JavaScript8 2 02

crawlers

ai-robots-txt/ai.robots.txt

omrilotan/isbot

flathunters/flathunter

salimk/Rcrawler

StJudeWasHere/seonaut

Norconex/crawlers

ArchiveTeam/wget-lua

narkhedesam/Proxy-List-Scrapper

jonasjacek/robots.txt

behitek/social-scraper

howie6879/hproxy

Potelo/laravel-block-bots

joaopauloaramuni/python

Symbolexe/Raven

BaseMax/GooglePlayWebServiceAPI

flulemon/sneakpeek

peterbencze/serritor

herrbischoff/user-agents

zcrawl/zcrawl

p0dalirius/crawlersuseragents

robertciotoiu/mobile-de-car-data-collector

anapaulagomes/licitacoes-de-feira

delvelabs/htcap

ElektroStudios/Google-Search-URL-Crawler

shaoxiongdu/SkyEye

arquejadalucy/jus_crawler

BryanMorgan/isbot

Hsins/Daily-GitHub-Trending

solidusio-contrib/solidus_sitemap

tranlv/wiki-link

BaseMax/StackoverflowCrawler

versioneye/crawl_r

romis2012/is-bot

acidus99/Kennedy

arthur3486/born2crawl

FEZIRO/wechat-miniprogram-spider-demo