mattgodbolt/zindex

Support JSON-based indexing

mattgodbolt opened this issue · 6 comments

Many logs (well, ones I care about) are in JSON format. Ideally, zindex should understand JSON and be able to index fields. That said: JSON querying is complex (see, for example, the excellent jq).

One possibility is to support indexes via an external program, unless jq happens to support a library-style use of its query language.

This is an awesome idea (not so much for logs, but for large json docs for my use case), but agreed.. it's extremely complex due to all the corner cases and recursion. Such a complex task might be better split into a higher-level jsonzindex package that uses zindex underneath. Or, perhaps, maybe zindex could perform a top level interpretation of jsonable output that could be queried by jq?

(btw what logs in json? unless it was lines of independent json, that wouldn't make lots of sense to me. encapsulating something like an appendable log file in one huge json doc doesn't seem logical.)

In my day job I work on systems and each input, intermediate states and outputs are treated as a single giant laudit object. This is serialised as JSON and written out one per line. One of the fields in the JSON is a unique ID, which is then from time to time logged elsewhere too (for audit entries of note). Later on the 10+GB of logs is gzipped and archived. Much later on I'll get asked "Hey, what happened on this day for this record?" and it's handy for me to grab the line from the archived log by its unique ID and look around at the entries either side (hence the -C option).

Cool, so basically a lot of lines of independent JSON dictionaries. (You can do similar with rsyslog templates - i.e., http://untergeek.com/2012/10/11/using-rsyslog-to-send-pre-formatted-json-to-logstash/). This use case sounds really useful and probably a lot easier to code than trying to building an in-memory tree of a really huge json doc. ;)

As of 5aab30b there's support for piping through a command, e.g. jq. Something like:

zindex ~/audit.log.gz -p 'jq --raw-output --unbuffered .eventId'

works for my use case, though I'm going to add support for multiple indices per line before it covers all my use cases.

Closing for now as I hope the external indexer (--pipe) is good enough. Failing that I will use libjq internally and bring the whole thing internally to zindex.