This application retrieve the git repositories from gitlab server, and scan through the commits in each repository and store the database in a PostgreSQL database for analysis. It's written in Python and uses Django Admin is the main interface, and Celery the the engine to execute jobs in background.
To run this application, an external PostgreSQL and Redis serer are required.
step 1: Create a new database and application user. e.g.
create user gitcrawler with encrypted password 'gitcrawler';
create database gitcrawler;
grant all privileges on database gitcrawler to gitcrawler;
step 2: Create an access token for accessing Gitlab private repositories. step 3: Create a ".env" file with the following content,
# one way to generate the secret is to run a command like
# on OS X
# LC_CTYPE=C tr -dc 'a-z0-9!@#$%^&*(-_=+)' < /dev/urandom | head -c50
# on Linux
# tr -dc 'a-z0-9!@#$%^&*(-_=+)' < /dev/urandom | head -c50
DJANGO_SECRET="some_random_stuff"
PG_HOST=your_database_host
PG_PORT=yoour_data_port
PG_USERNAME=database_user
PG_PASSWORD=database_password
PG_DATABASE=database_name
DEBUG_MODE=0
REDIS_URI=redis://<redis_host>:<redis_port>
KEY_PASSWORD=dtWqGZiVWi964R
step 4: Run this application from an existing docker image
./run
step 5: Login and look aroud at http://127.0.0.1:8000
step 6: before run indexing on any repository, please add public key file root/ssh/id_rsa to the git server. http authentication is not supported at the moment.
Install Python 3.8, create virtualenv and install dependencies
# assume you have python 3.8 and pip already
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
# use dev server or gunicorn to run the application
# consult Procfile to see how to start flower and celery worker
python manage.py runserver
The private key in root/ssh
directory is copied into the container image at build time. When container runs, the entrypoint.sh scripts decrypt it using the value of KEY_PASSWORD env var and save it to id_rsa, thus this private key is accessible to anyone who can get into the container at runtime.
Following the steps below to generate a new key pair:
# generate new keypair
ssh-keygen -t rsa -b 2048
# create a new key password, don't loose it once it's generated
export KEY_PASSWORD=$(tr -dc 'a-z0-9!@#$%^&*(-_=+)' < /dev/urandom | head -c16)
openssl enc -aes-256-cbc -pbkdf2 -in id_rsa -out id_rsa.enc -k $KEY_PASSWORD
# the encrypted key will be in id_rsa.enc file
# it will be copied into container image during build
docker build -t <your image name>:<your_tag> .
# don't forget to update your .env file
Folllow instructions here to deploy the processes using systemd user mode.
Install the editor and tools first.
- Install a python runtime, either 3.7 or 3.8 should work.
- Visual Studio Code, with extentions ms-python, pylance and python-test-adapter
- Poetry environment manager
Then clone this repo and run the tests
git clone <this_repo_url>
poetry install
pytest -s
# open vscode and start hacking
code .
The application requires PostgreSQL and Redis to run. The test cases only depends on Sqlite, so for local development experiement, sqlite should do the trick. Just make sure to modifying the settings.py to change database adapter. Below are links to framework/libraries used.
- The Django framework, espically the Django Admin Site features.
- Pytest, the most popular python test runner
- Background job executor Celery
- Flake8, a python linter
- Python API binding for Gitlab
- Python API binding for Github
- Python API binding for Atlassian products