tstack/lnav

Ability to use `line-format` for non-JSON encoded logs

Opened this issue · 2 comments

Is your feature request related to a problem? Please describe.
I'm dealing with non-JSON log files that contain variable-width source file identifiers.
I would like to align log entries using line-format's min-width, max-width, align and overflow settings.

Describe the solution you'd like
I'd like to be able to use field names from the value object in the line-format's field field.

Describe alternatives you've considered
I saw that values have a rewriter field but I was unable to understand from the documentation how to achieve what I would like to using that field. Also, I'm not looking forward to re-implement abbrev logic when it is already available elsewhere.

Additional context

Example of the input logs:

poc  2024-09-20T07:01:20.111540Z INFO    1:Microsoft.Extensions.Hosting.Internal.ApplicationLifetime:0   Content root path: /foo/bar
poc  2024-09-20T07:01:20.111615Z DEBUG   1:Microsoft.Extensions.Hosting.Internal.Host:0   Hosting started
poc  2024-09-20T07:01:30.083486Z INFO    7:Foo.Bar.BarService:34   Worker running at: 09/20/2024 07:01:30 +00:00

Desired output:

poc  2024-09-20T07:01:20.111540Z INFO    1:    M.E.H.I.ApplicationLifetime:0   Content root path: /foo/bar
poc  2024-09-20T07:01:20.111615Z DEBUG   1:                   M.E.H.I.Host:0   Hosting started
poc  2024-09-20T07:01:30.083486Z INFO    7:                F.B.BarService:34   Worker running at: 09/20/2024 07:01:30 +00:00
Here is the custom format file that I'm working with
{
    "$schema": "https://lnav.org/schemas/format-v1.schema.json",
    "indigital": {
        "description": "Format file generated from regex101 entry",
        "regex": {
            "std": {
                "pattern": "^\\s*(?<prod>\\w+)\\s+(?<timestamp>\\d{4,4}-\\d{2,2}-\\d{2,2}T\\d{2,2}:\\d{2,2}:\\d{2,2}.\\d{0,6}Z)\\s+(?<lvl>\\w+)\\s+(?<pid>\\d+):(?<source>.+?):(?<line>\\d+)\\s+(?<msg>.+)$"
            }
        },
        "value": {
            "line": {
                "kind": "integer"
            },
            "lvl": {
                "kind": "string"
            },
            "msg": {
                "kind": "string"
            },
            "pid": {
                "kind": "integer"
            },
            "prod": {
                "kind": "string"
            },
            "source": {
                "kind": "string"
            },
            "timestamp": {
                "kind": "string"
            }
        },
        "level-field": "lvl",
        "body-field": "msg",
        "sample": [
            {
                "line": " poc  2024-09-19T15:33:52.428593Z INFO  5:Foo.Bar.BarService:34   Worker running at: 09/20/2024 07:01:30 +00:00"
            }
        ]
    }
}

The challenge with doing this is that searches and some other operations need to be done on the rendered text and not on the original file, which would affect performance quite a bit.

Personally I'd be fine with search working on the original file/stream and highlighting being limited (for example highlighting entire entry if the match is found but not present in the UI) as I'm not going to remove/edit/rewrite any crucial information that I'd be searching on. The goal is only to make the logs more human-eye-friendly by stripping some excessive info and improving the indentation/alignment.