The Beast

The beast is an experimental, flexible, declarative-oriented toolkit to read machinereadable data from the various sources and transform them into follow the money entities.

Do not rely on this one until it is out of alpha. Everything is very volatile

Current status

High priority

Ingest from databases (mongo, postgres) using SQLAlchemy or PeeWee
Tests for the databases ingest
Basic CLI
Signals on exceptions and policy for the incorrectly parsed entity values (drop, drop all, drop entity, reraise)
Tests for the signals
Stats collector (number of signals of each type, number of invalid entities, etc)
Packaging (partially done in packaging_and_spark_integration branch)
Documentation (@legless, your notes will be very valuable)

Low priority

Advanced ingest routines: regex validation to discard values that do not pass the test?
Tests for the resolver wrappers

Done

Running tests

pip install -r requirements.txt
python -m pytest

Run using Docker

/bin/ directory contains scripts to run Beast inside Docker container.

Use /bin/run data/mapping.yaml to run Beast with selected mapping. Note: mapping and source file(s) must be in Beast root (sub-)directory. E.g. ./data/mapping.yaml You can't point Beast to a file outside it's root directory.

Use /bin/tests to run tests.

Use /bin/black to run black to format source files before contributing a pull request.