owi Scheduler Overview

A distributed Web Crawling Schedulercomponent to distribute URLs from Frontier to multiple Fetcher.

Distributed Fetcher ask for new URL Lists per REST API call.

Python Packages

The project is built on the python package FastAPI (MIT licensed) (https://fastapi.tiangolo.com/). FastApi itself is built on top of the following packages:

The Project also import parts of the following Libraries / Frameworks

Docker Image

The Docker Image provided by FastAPI is used as well

  • tiangolo/uvicorn-gunicorn-fastapi:latest


The project is deployed on an AWS EC2 Ubuntu Machine.

[Link to Online Docs] http://ec2-18-185-96-23.eu-central-1.compute.amazonaws.com/docs


Re-Run local Docker-Image (Windows PowerShell)

docker ps -q | % { docker stop $_ }
docker pull dockerjens23/websch
docker build -t websch .
docker run -d -p 80:80 websch

Re-Run remote Docker-Image (Ubuntu)

sudo docker stop $(sudo docker ps -q)
sudo docker pull dockerjens23/websch
sudo docker run -d -p 80:80 dockerjens23/websch

Get Loginfo of running Container

sudo docker logs --follow $(sudo docker ps -q)

Linux Server Admin Commands

# disk free (human-readable)
df -h
# list all docker container (inactive, too)
sudo docker ps -a

Start Docker with PostgreSQL Credentials as Environment Variables

sudo docker run --env-file ./env.list -p 80:80

Environment Variables file