/github_events_consumer

Consumes data from the GitHub Events API and exposes some related stats through various API endpoints

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

GitHub Events API consumer

This repository is a tool to quickly get data from GitHub Events API and stream it into a Mongo database, and have it exposed/analysed via several dedicated APIs.

It is a Docker setup with 2 main containers based on Celery (including Beat), Spark, asyncio, FastAPI and Mongo. Monitoring containers can be spin up e.g. Mongo Express and Flower.

The repository itself is based on the 'biggie' project.


Installation

Environment

You have to create the .env environment file and use/create a Github token for it. Eventually tweak the schedule parameter for the cleaning task (see "Data streaming" section below.).

If you plan to use the same Github-actions CI file, you need to create the same secrets as in the jobs > env section in .github/workflows/docker-ci.yml (see line 31).

NB:

  • For all files embedded with secrets, you'll find the <file>.example ready to adapt.

Build

The docker-compose.main file is structured to make the test containers build the image used by the prod image. Hence the need to run one of the following commands on the very first run:

docker-compose up api_test celery_test
OR
docker-compose --profile test up

Run

Data streaming from the GitHub API into Mongo

docker-compose up celery_prod

This command will spin up the Celery container and:

  • download paginated data and save them as files locally
  • read these files and load Mongo with relevant data
  • delete all local files once their data is successfully in Mongo

These tasks are scheduled every minute with a crontab setting, and a custom parameter is implemented to separately schedule the cleaning step while keeping it in sync with the rest of the chain.

See kwargs={"wait_minutes": 30} in the beat_schedule parameter in celery_app/config.py.


Data streaming with monitoring

Spin up the Mongo-Express container to access the Mongo-Express and Flower UI along the Celery production container.

docker-compose -f docker-compose.yml -f docker-compose.monitoring.yml --profile monitoring up
docker-compose \
  -f docker-compose.yml \
  -f docker-compose.monitoring.yml \
  --profile monitoring \
  up

API container

Just to have the FastAPI container up

docker-compose up api_prod

Monitoring and Production containers

Both production containers as well as both monitoring containers.

docker-compose \
  -f docker-compose.yml \
  -f docker-compose.monitoring.yml \
  --profile prod --profile monitoring \
  up

Nginx deployment (API container only)

In this configuration, you need to have the necessary sub-domains setup on your domain provider side. You also need:

  • Nginx installed on the host machine
  • a certificate generated by certbot without any changes to nginx configuration (see documentation)
    sudo certbot certonly --nginx    # example command for ubuntu 20
    

Then create the required files and change the volumes path accordingly in the compose files. The nginx configuration files are:

conf/nginx/certificate.json conf/nginx/app_docker.conf conf/nginx/monitor_docker.conf

Finally run the docker-compose command with the live_prod profile to spin up all that to the world:

docker-compose \
  -f docker-compose.yml \
  -f docker-compose.monitoring.yml \
  --profile prod --profile monitoring --profile live_prod \
  up

Local URLs

API docs

API

NB: For the last 2 endpoints, the repository name parameter takes the full repository name including the actor name such as pierrz/biggie.

Monitoring


Local development

If you want to make some changes in this repo while following the same environment tooling, you can run the following command from the root directory:

poetry config virtualenvs.in-project true
poetry install && poetry shell
pre-commit install

To change the code of the core containers, you need to cd to the related directory and either:

  • run poetry update to simply install the required dependencies
  • run the previous command to create a dedicated virtualenv