This repository is a tool to quickly get data from GitHub Events API and stream it into a Mongo database, and have it exposed/analysed via several dedicated APIs.
It is a Docker setup with 2 main containers based on Celery (including Beat), Spark, asyncio, FastAPI and Mongo. Monitoring containers can be spin up e.g. Mongo Express and Flower.
The repository itself is based on the 'biggie' project.
You have to create the .env
environment file and use/create a Github token for it.
Eventually tweak the schedule parameter for the cleaning task (see "Data streaming" section below.).
If you plan to use the same Github-actions CI file, you need to create the same secrets
as in the jobs > env
section in .github/workflows/docker-ci.yml
(see line 31).
NB:
- For all files embedded with secrets, you'll find the
<file>.example
ready to adapt.
The docker-compose.main
file is structured to make the test
containers build the image
used by the prod
image. Hence the need to run one of the following commands on the very first run:
docker-compose up api_test celery_test
OR
docker-compose --profile test up
docker-compose up celery_prod
This command will spin up the Celery container and:
- download paginated data and save them as files locally
- read these files and load Mongo with relevant data
- delete all local files once their data is successfully in Mongo
These tasks are scheduled every minute with a crontab setting, and a custom parameter is implemented to separately schedule the cleaning step while keeping it in sync with the rest of the chain.
See kwargs={"wait_minutes": 30}
in the beat_schedule
parameter in celery_app/config.py
.
Spin up the Mongo-Express container to access the Mongo-Express and Flower UI along the Celery production container.
docker-compose -f docker-compose.yml -f docker-compose.monitoring.yml --profile monitoring up
docker-compose \
-f docker-compose.yml \
-f docker-compose.monitoring.yml \
--profile monitoring \
up
Just to have the FastAPI container up
docker-compose up api_prod
Both production containers as well as both monitoring containers.
docker-compose \
-f docker-compose.yml \
-f docker-compose.monitoring.yml \
--profile prod --profile monitoring \
up
In this configuration, you need to have the necessary sub-domains setup on your domain provider side. You also need:
- Nginx installed on the host machine
- a certificate generated by
certbot
without any changes to nginx configuration (see documentation)sudo certbot certonly --nginx # example command for ubuntu 20
Then create the required files and change the volumes
path accordingly in the compose
files.
The nginx
configuration files are:
conf/nginx/certificate.json
conf/nginx/app_docker.conf
conf/nginx/monitor_docker.conf
Finally run the docker-compose
command with the live_prod
profile
to spin up all that to the world:
docker-compose \
-f docker-compose.yml \
-f docker-compose.monitoring.yml \
--profile prod --profile monitoring --profile live_prod \
up
API
- Count per type with a given time offset in minutes
- PR average delta for a given repository
- Timeline of PR deltas for a given repository (dataviz)
NB: For the last 2 endpoints, the repository name parameter takes the full repository name
including the actor name such as pierrz/biggie
.
Monitoring
If you want to make some changes in this repo while following the same environment tooling, you can run the following command from the root directory:
poetry config virtualenvs.in-project true
poetry install && poetry shell
pre-commit install
To change the code of the core containers, you need to cd
to the related directory
and either:
- run
poetry update
to simply install the required dependencies - run the previous command to create a dedicated virtualenv