The Scheduler Application schedules website checks by fetching configurations from PostgreSQL and producing tasks for Kafka.
- PostgreSQL Service: Fetches website configurations.
- Kafka Producer Service: Produces tasks for Kafka.
The Checker Application monitors website availability by consuming tasks from Kafka, checking websites via HTTP requests, and storing metrics in PostgreSQL.
- Kafka Consumer Service: Consumes tasks from Kafka.
- HTTP Service: Performs website checks.
- PostgreSQL Service: Stores metrics.
python 3.10 or higher required
python -m venv .venv
source .venv/bin/activate
Or
pip install virtualenvwrapper
mkvirtualenv webmon
Then
pip install -r requirements_dev.txt
cp scheduler/config-example.yaml scheduler/config.yaml
cp checker/config-example.yaml checker/config.yaml
Adjust apps' configs according to your postgres and kafka settings
make scheduler-db-init
make checker-db-init
Put ca.pem
, service.cert
, and service.key
files to the ./certs/kafka/
folder. It's also possible to use Kafka without ssl - just delete the ssl
section from config.yaml
.
Run the scheduler app:
make run-scheduler
And, in a separate shell - checker app:
make run-checker
make test-scheduler
make test-checker
There are a few alternative options for the scheduler architecture, such as the use of a sorted in-memory queue of sites. But unlike the current implementation, those are not scalable and take O(n) memory.
A single instance of the scheduler can create a few tens of thousands of tasks every 5 seconds. It can be optimized to handle more by implementing batch processing.
To check millions of sites we need to implement some orchestrator for schedulers or use db shards with one scheduler instance per shard.
Checker can be scaled as much as Kafka allows.
- separate venv and build configs for checker and scheduler
- divide checker to 2 parts - one checks sites, another saves data
- use milliseconds for more precision
- better monorepo-like setup
- integration tests