This repository features an example service consisting of multiple components working hand in hand to collect URLs mentioned on Twitter and create a hotlist of popular URLs.
It can be run with docker-compose or Kubernetes.
Contents:
Checkout the docker-compose.yml
files for a more technical description of what this example service provides. Compare with the Kubernetes manifests in the kubernetes
foleder.
This component consumes the Twitter Stream API, looking for tweets containing the strings http
or https
to fetch all tweets with links. The tweets are then parsed for contained URLs.
The URLs found are stored in the inbox
redis database.
This component is a simple Redis database that receives all found URLs from the tracker
component. It makes use of the official Redis Docker image.
This component consciously does not provide a volume, which means that whenever this component is restarted, the database content is lost.
The script inside this component reads URLs from the inbox
Redis database and creates requests to those URLs in order to resolve redirects, to reveal the actual target URL. The resulting URL is stored in the hotlist
Redis database.
To prevent accessing the same URL several times, a cache is maintained in the hotlist
Redis.
The resolver
component can be thought of as a worker, processing jobs from a queue. Since resolving URLs is in many cases a time-consuming job, there can be multiple instances of this component working in parallel.
This component contains a little script that watches the size of the inbox
Redis database to find out if it remains constant. In case it's growing, it logs this information and tells that there shoul be more resolver
instances to prevent the inbox from growing too big.
As a future improvement, the resolver-scaler
can be modified to actually initiate the scaling of the resolver
component via the Giant Swarm API.
This second Redis database component stores all resolved URLs together with scoring information. It also contains the cache for the resolver
. Just like the inbox
component, we use the official Redis Docker image here.
In contrast to the inbox
component, the hotlist
provides a volume to persist the database throughout restarts.
This component contains a little helper that periodically removes outdated information from the hotlist
Redis database.
This is a Python/Flask web application that offers a JSON API to fetch the resulting URL hotlist.
The rebrow
component offers a web-based user interface ("rebrow" stands for "redis browser") to debug the content of both Redis databases. It makes use of a third party Docker image.
To access the streaming API of Twitter an personalized account is needed and some app specific credentials created at Twitter Application Management.
For example:
Name: thux
Description: Tracks URLs mentioned on Twitter and creates a ranked list
Website: https://github.com/giantswarm/twitter-hot-urls-example
Callback URL: <leave this field blank>
Additionally an Access Token needs to be generated under "Keys and Access Tokens". In the end four secrets or tokens need to be edited in secrets/twitter-api-secret.env
for the docker-compose setup and in secrets/twitter-api-secret.yaml
to run the Kubernetes example. For Kubernetes these values need to be encoded with base64
, please see Kubernetes documentation about secrets.
docker-compose up -d
docker-compose ps
docker-compose logs
docker-compose stop tracker
docker network ls
docker network inspect thux_default