/venus

Centralised logging database

Primary LanguagePython

Venus

Centralised logging server

https://raw.githubusercontent.com/cjrh/venus/master/venus.jpg

Development Setup

The easy way to install for development is the following:

$ pip install -e .[all]

This installs venus in dev mode (editable install), and all the required development packages.

Querying data

Logging data is stored with the following schema:

time: timestamp with time zone
message: text
correlation_id: uuid
data: jsonb

When applications send their logging data over, each message will be a JSON message that looks something like this:

{
    "name": "root",
    "msg": "blah blah blah",
    "args": [],
    "levelname": "INFO",
    "levelno": 20,
    "pathname": "tests/sender.py",
    "filename": "sender.py",
    "module": "sender",
    "exc_text": null,
    "stack_info": null,
    "lineno": 59,
    "funcName": "app_items",
    "created": 1554635562.8368905,
    "msecs": 836.890459060669,
    "relativeCreated": 1485.8589172363281,
    "thread": 15368,
    "threadName": "MainThread",
    "processName": "MainProcess",
    "process": 11604,
    "correlation_id": "8e820a74-ef80-4fbe-a4f7-692f6352b6be",
    "random_timing_data": 1.23,
    "message": "blah blah blah",
    "created_iso": "2019-04-07T11:12:42.836890+00:00"
}

What happens is that venus will

  1. write created_iso message field to the time DB field;
  2. write message message field to the message DB field;
  3. write correlation_id message field to the correlation_id DB field;
  4. remove some fields from the JSON blob (based on a configurable ignore list)
  5. write the entire JSON blob to the data DB field.

This is how the logging data gets into the database.

When you want to query the database, typically one would specify a correlation id and perhaps also a time constraint. If you want to get back specific fields in the JSON blob, you must use the JSONB operators and functions for that. However, there is also a shortcut way to decompose the JSONB blob into dedicated columns, assuming you know what type they need to be.

Here is an example of querying such data:

select
    time,
    created_iso,
    *,
    logs.data->>'filename',  # This is the normal way to access JSONB
    logs.data#>>'{filename}'
from
     logs,
     LATERAL jsonb_to_record(logs.data) as x(
         msg text,  -- Need to know which names you're looking for
         filename text,
         pathname text,
         levelno int,
         lineno int,
         randon_timing_data double precision,
         created_iso text
     )
where filename = 'sender.py'  -- Can constrain on the new fields
order by time desc
limit 10;

Of course, if you need to constrain JSONB subfields in the WHERE clause it'll be more efficient to use the JSONB operators directly so that the GIN index on the data field can be used.

Docker workflow

Building the image:

$ docker build -f Dockerfile -t venus:$(cat VERSION) --build-arg VERSION=$(cat VERSION) .
$ docker images
REPOSITORY                   TAG                    IMAGE ID            CREATED             SIZE
venus                        0.0.6                  41003c1f6c29        10 seconds ago      195MB
<...snip...>
$

Running the image:

$ docker run --rm -it \
    --net=host \     # EITHER this...
    -p 5049:5049 \   # OR this.
    -e VENUS_PORT=5049 \  # For apps to connect to venus
    -e DB_HOST=postgres.hostname.com \
    -e DB_PORT=5432 \
    -e DB_NAME=venus \
    -e DB_USERNAME='postgres' \
    -e DB_PASSWORD='password'
    venus:0.0.6