/crawler

Distributed crawler written in PHP

Primary LanguagePHP

About

A distributed crawler

Requirements

Installation

Via Composer:

composer require teamtnt/crawler

Configuration

Each instance needs to have an identifier. This can be added in .env

NODE_NAME="Instance 1"

The domain feeder needs to start with a seed domain. After that, running

php artisan crawler

For scraping a single url

php artisan url:frontier www.example.com/something

Crawler Topology

Crawler Topology

Domain Feeder

Domain Feeder

Single Instance

Single Instance

URL Frontier

URL Frontier