This is the LAGOON (acronym...) project source code.
Note that ./lagoon_cli.py
is a CLI for running common LAGOON functions.
- Run
pip install -r requirements.txt
to ensure your Python environment has LAGOON's dependencies. - Also ensure you have Docker installed.
- Run
./lagoon_cli.py dev up
to launch an appropriately configured Postgres DB (and any other services required by LAGOON). - Either use a pre-populated database or build one from scratch (see two sections below).
- Run
./lagoon_cli.py ui
to browse around visually. - Run
./lagoon_cli.py shell
to interact with the database in a CLI. - If running machine learning experiments is desired:
- Run
pip install -r requirements-ml.txt
. - Clone the
lagoon-artifacts
repository as a sibling to this repository.
- Run
This method is preferred, as it saves a lot of time.
- Retrieve a backup of the database, named like
lagoon-db-backup-DATE
in Google Drive. - Run
./lagoon_cli.py dev backup-restore path/to/backup
to restore the database.
- Run
./lagoon_cli.py db reset
to delete / create / set up the database. - Clone e.g. the CPython repository somewhere.
- Run
./lagoon_cli.py ingest git load <path/to/cpython>
to extract information from git into the LAGOON database. This took just under two hours on my laptop. - Run
./lagoon_cli.py ingest ocean_pickle load ~/Downloads/python.pck
to extract information from OCEAN data. - Run
./lagoon_cli.py ingest python_peps load
to extract information regarding Python PEPs into the LAGOON database. - Run
./lagoon_cli.py ingest toxicity_badwords compute
to compute bad-word-based toxicity on messages and git commits, and put that information in the LAGOON database. - Run
./lagoon_cli.py ingest toxicity_nlp compute
to compute toxicity scores from natural language processing models on messages and git commits, and put that information in the LAGOON database. This step requires the following:- Run
pip install -r requirements-ml.txt
. - Download pre-trained NLP models from Google Drive and place them inside
ml/nlp_models/
.
- Run
- Run
./lagoon_cli.py ingest hibp load-breaches
(and, optionally,./lagoon_cli.py ingest hibp load-pastes
) to extract the number of breaches (and pastes) from Have I Been Pwned for emails in the LAGOON database. - Run
./lagoon_cli.py fusion run
to fuse entities and re-compute caches.
For development, after any change which affects attributes in the database, ./lagoon_cli.py fusion recache
must be run to re-cache the latest attribute set.
Building the documentation requires a few additional packages, which may be installed as pip install -r requirements-dev.txt
.
System documentation may be built with the following commands:
$ cd docs
$ make html
$ open _build/html/index.html
If the Postgres docker container holding the database crashes, no worries. The actual database files are stored in the folder ../deploy/dev/db
, so as long as that still exists, the database is not truly deleted. If the container crashes, do docker stop <container_id>
, and then ./lagoon_cli.py dev up
again. May also want to restart VSCode.
Sometimes, the database might get upgraded. To upgrade your database to the latest version, run:
$ ./lagoon_cli.py alembic -- upgrade head
pgadmin is a popular tool for investigating PostgreSQL installations. To launch an instance of it pointing at the development database, call:
$ ./lagoon_cli.py db pgadmin
It may take up to a minute to actually open a browser tab.