/bemi-django

Automatic data change tracking for Django

Primary LanguagePythonGNU Lesser General Public License v3.0LGPL-3.0

bemi-banner

Website · Docs · Example · Report Bug · Request Feature · Discord · X · LinkedIn

Bemi Django

Bemi plugs into Django and your existing PostgreSQL database to track data changes automatically. It unlocks robust context-aware audit trails and time travel querying inside your application.

Designed with simplicity and non-invasiveness in mind, Bemi doesn't require any alterations to your existing database structure. It operates in the background, empowering you with data change tracking features.

This library is an optional Django integration, enabling you to pass application-specific context when performing database changes. This can include context such as the 'where' (API endpoint, worker, etc.), 'who' (user, cron job, etc.), and 'how' behind a change, thereby enriching the information captured by Bemi.

Contents

Highlights

  • Automatic and secure database change tracking with application-specific context in a structured form
  • 100% reliability in capturing data changes, even if executed through direct SQL outside the application
  • High performance without affecting code runtime execution and database workload
  • Easy-to-use without changing table structures and rewriting the code
  • Time travel querying and ability to easily group and filter changes
  • Scalability with an automatically provisioned cloud infrastructure
  • Full ownership of your data

See an example repo for Django that automatically tracks all changes.

Use cases

There's a wide range of use cases that Bemi is built for! The tech was initially built as a compliance engineering system for fintech that supported $15B worth of assets under management, but has since been extracted into a general-purpose utility. Some use cases include:

  • Audit Trails: Use logs for compliance purposes or surface them to customer support and external customers.
  • Change Reversion: Revert changes made by a user or rollback all data changes within an API request.
  • Time Travel: Retrieve historical data without implementing event sourcing.
  • Troubleshooting: Identify the root cause of application issues.
  • Distributed Tracing: Track changes across distributed systems.
  • Testing: Rollback or roll-forward to different application test states.
  • Analyzing Trends: Gain insights into historical trends and changes for informed decision-making.

Quickstart

Install the Python package

pip install bemi-django

Add the bemi app to your Django project's INSTALLED_APPS

INSTALLED_APPS = [
    # Your other apps
    'bemi',
]

Run the migration to add lightweight PostgreSQL triggers for passing application context with all data changes into PostgreSQL's replication log

python manage.py migrate bemi

Add the Bemi middleware to your Django project app and add the path to a get_bemi_context function to automatically pass application context with all tracked database changes made within an HTTP request:

# settings.py
MIDDLEWARE = [
    # Your other middlewares
    'bemi.BemiMiddleware',
]

BEMI_CONTEXT_FUNCTION = 'your_project.utils.get_bemi_context'
# utils.py
def get_bemi_context(request):
    # Return any custom context dict
    return {
      'user_id': request.user.id,
      'method': request.method,
      'path': request.path,
    }

Application context:

  • Is bound to the current execution thread within an HTTP request.
  • Is used only with INSERT, UPDATE, DELETE SQL queries performed via Django. Otherwise, it is a no-op.
  • Is passed directly into PG Write-Ahead Log with data changes without affecting the structure of the database and SQL queries.

Application context will automatically include the original SQL query that performed data changes, which is generally useful for troubleshooting purposes.

Data change tracking

Local database

To test data change tracking and the Django integration with a locally connected PostgreSQL, you need to set up your local PostgreSQL.

First, make sure your database has SHOW wal_level; returning logical. Otherwise, you need to run the following SQL command:

-- Don't forget to restart your PostgreSQL server after running this command
ALTER SYSTEM SET wal_level = logical;

To track both the "before" and "after" states on data changes, please run the following SQL command:

ALTER TABLE [YOUR_TABLE_NAME] REPLICA IDENTITY FULL;

Then, run a Docker container that connects to your local PostgreSQL database and starts tracking all data changes:

docker run \
  -e DB_HOST=host.docker.internal \
  -e DB_PORT=5432 \
  -e DB_NAME=[YOUR_DATABASE] \
  -e DB_USER=postgres \
  -e DB_PASSWORD=postgres \
  public.ecr.aws/bemi/dev:latest

Replace DB_NAME with your local database name. Note that DB_HOST pointing to host.docker.internal allows accessing 127.0.0.1 on your host machine if you run PostgreSQL outside Docker. Customize DB_USER and DB_PASSWORD with your PostgreSQL credentials if needed.

Now try making some database changes. This will add a new record in the changes table within the same local database after a few seconds:

psql postgres://postgres:postgres@127.0.0.1:5432/[YOUR_DATABASE] -c \
  'SELECT "primary_key", "table", "operation", "before", "after", "context", "committed_at" FROM changes;'

 primary_key | table | operation |                       before                       |                       after                         |                        context                                                            |      committed_at
-------------+-------+-----------+----------------------------------------------------+-----------------------------------------------------+-------------------------------------------------------------------------------------------+------------------------
 26          | todo  | CREATE    | {}                                                 | {"id": 26, "task": "Sleep", "is_completed": false}  | {"user_id": 187234, "endpoint": "/todo", "method": "POST", "SQL": "INSERT INTO ..."}      | 2023-12-11 17:09:09+00
 27          | todo  | CREATE    | {}                                                 | {"id": 27, "task": "Eat", "is_completed": false}    | {"user_id": 187234, "endpoint": "/todo", "method": "POST", "SQL": "INSERT INTO ..."}      | 2023-12-11 17:09:11+00
 28          | todo  | CREATE    | {}                                                 | {"id": 28, "task": "Repeat", "is_completed": false} | {"user_id": 187234, "endpoint": "/todo", "method": "POST", "SQL": "INSERT INTO ..."}      | 2023-12-11 17:09:13+00
 26          | todo  | UPDATE    | {"id": 26, "task": "Sleep", "is_completed": false} | {"id": 26, "task": "Sleep", "is_completed": true}   | {"user_id": 187234, "endpoint": "/todo/complete", "method": "PUT", "SQL": "UPDATE ..."}   | 2023-12-11 17:09:15+00
 27          | todo  | DELETE    | {"id": 27, "task": "Eat", "is_completed": false}   | {}                                                  | {"user_id": 187234, "endpoint": "/todo/27", "method": "DELETE", "SQL": "DELETE FROM ..."} | 2023-12-11 17:09:18+00

Check out our Django docs for more details.

Architecture overview

Bemi is designed to be lightweight and secure. It takes a practical approach to achieving the benefits of event sourcing without requiring rearchitecting existing code, switching to highly specialized databases, or using unnecessary git-like abstractions on top of databases. We want your system to work the way it already does with your existing database to allow keeping things as simple as possible.

Bemi plugs into both the database and application levels, ensuring 100% reliability and a comprehensive understanding of every change.

On the database level, Bemi securely connects to PostgreSQL's Write-Ahead Log and implements Change Data Capture. This allows tracking even the changes that get triggered via direct SQL.

On the application level, this Python package automatically passes application context to the replication logs to enhance the low-level database changes. For example, information about a user who made a change, an API endpoint where the change was triggered, a worker name that automatically triggered database changes, etc.

Bemi workers then stitch the low-level data with the application context and store this information in a structured easily queryable format, as depicted below:

bemi-architechture

The cloud solution includes worker ingesters, queues for fault tolerance, and a serverless PostgreSQL. If you are interested in running a self-hosted version yourself, see our self-hosting docs.

License

Distributed under the terms of the LGPL-3.0 License. If you need to modify and distribute the code, please release it to contribute back to the open-source community.