loggie-io/loggie

Regex in transformer lost key body and him value

Opened this issue · 0 comments

sentoz commented

What version of Loggie?

v1.4.1

Expected Behavior

I expect that the log will be parsed based on the regex mask, preserving the body key and its value, and correctly published in loki, as happens in normalize regex.

Actual Behavior

I have an incoming log message from a container with nginx:

10.100.73.139 - - [16/Jan/2024:12:58:59 +0000] "GET / HTTP/1.1" 200 45196 "-" "curl/7.81.0" "127.0.0.1"

I parse it using regex by mask:

^(?<host>\S+) - (?<user>\S+) \[(?<time>.*)\] "(?<method>\S+) (?<request_url>\S+) (?<request_http_protocol>\S+)" (?<status>\S+) (?<bytes_out>\S+) "(?<http_referer>[^"]*)" "(?<user_agent>[^"]*)"? "(?<ip>[^"]*)"?

Those. my crd interceptors looks like this:

apiVersion: loggie.io/v1beta1
kind: Interceptor
metadata:
  name: nginx
spec:
  interceptors: |
    - type: transformer
      actions:
      - action: regex(message)
        pattern: ^(?<host>\S+) - (?<user>\S+) \[(?<time>.*)\] "(?<method>\S+) (?<request_url>\S+) (?<request_http_protocol>\S+)" (?<status>\S+) (?<bytes_out>\S+) "(?<http_referer>[^"]*)" "(?<user_agent>[^"]*)"? "(?<ip>[^"]*)"?
        ignoreError: false

By transferring loggie to debug and adding print(), I see that my log message is parsed correctly, but the body key with its value disappears.

{"level":"info","time":"2024-01-16 13:31:54","caller":"/pkg/pipeline/pipeline.go:1139","message":"source ui-app-shell-8b56755d8-vmr5b/ui-app-shell/default interceptor chain: source->interceptor/maxbytes->interceptor/transformer->queue"}
{"level":"info","time":"2024-01-16 13:33:29","caller":"/pkg/interceptor/transformer/action/print.go:67","message":"event: {\"bytes_out\":\"45196\",\"user_agent\":\"curl/7.81.0\",\"host\":\"10.100.167.188\",\"time\":\"16/Jan/2024:13:33:29 +0000\",\"user\":\"-\",\"request_url\":\"/\",\"status\":\"200\",\"http_referer\":\"-\",\"ip\":\"127.0.0.1\",\"fields\":{\"namespace\":\"default\",\"nodeip\":\"10.10.12.22\",\"podid\":\"9e7a2e06-9b02-46a5-89b3-f4e20b3ba829\",\"podname\":\"ui-app-shell-8b56755d8-vmr5b\",\"logconfig\":\"ui-app-shell\",\"workloadname\":\"ui-app-shell\",\"cluster\":\"k8s-test\",\"workloadkind\":\"Deployment\",\"containername\":\"ui-app-shell\",\"nodename\":\"k8s-test-worker-a-2\"},\"request_http_protocol\":\"HTTP/1.1\",\"method\":\"GET\"}"}

And accordingly, nothing gets into my loki.
If I first copy the contents of the body: in the message: and parse the message, then parsing the log and publishing it in loki occurs correctly.

I also noticed that if you use the outdated normalize regex, then there are no problems either. The log is parsed by mask and published in loki.

{"level":"info","time":"2024-01-16 13:54:44","caller":"/pkg/pipeline/pipeline.go:1139","message":"source ui-app-shell-8b56755d8-vmr5b/ui-app-shell/default interceptor chain: source->interceptor/maxbytes->interceptor/normalize->interceptor/transformer->queue"}
{"level":"info","time":"2024-01-16 13:54:54","caller":"/pkg/interceptor/transformer/action/print.go:67","message":"event: {\"request_http_protocol\":\"HTTP/1.1\",\"user\":\"-\",\"time\":\"16/Jan/2024:13:54:53 +0000\",\"user_agent\":\"curl/7.81.0\",\"method\":\"GET\",\"status\":\"200\",\"fields\":{\"workloadname\":\"ui-app-shell\",\"nodeip\":\"10.10.12.22\",\"nodename\":\"k8s-test-worker-a-2\",\"logconfig\":\"ui-app-shell\",\"cluster\":\"k8s-test\",\"containername\":\"ui-app-shell\",\"podname\":\"ui-app-shell-8b56755d8-vmr5b\",\"workloadkind\":\"Deployment\",\"namespace\":\"default\",\"podid\":\"9e7a2e06-9b02-46a5-89b3-f4e20b3ba829\"},\"ip\":\"127.0.0.1\",\"request_url\":\"/\",\"body\":\"10.100.167.188 - - [16/Jan/2024:13:54:53 +0000] \\\"GET / HTTP/1.1\\\" 200 45196 \\\"-\\\" \\\"curl/7.81.0\\\" \\\"127.0.0.1\\\"\",\"bytes_out\":\"45196\",\"http_referer\":\"-\",\"host\":\"10.100.167.188\"}"}

Steps to Reproduce the Problem

  1. Deploy loggie v1.4.1 with configuration:
loggie:
  http:
    enabled: true
    port: 9196
  monitor:
    listeners:
      filesource:
        period: 10s
      filewatcher:
        period: 5m
      pipeline:
        period: 10s
      queue:
        period: 10s
      reload:
        period: 10s
      sink:
        period: 10s
    logger:
      enabled: true
      period: 30s
  reload:
    enabled: true
    period: 10s
  discovery:
    enabled: true
    kubernetes:
      cluster: k8s-test
      containerRuntime: containerd
      dynamicContainerLog: false
      parseStdout: true
      rootFsCollectionEnabled: false
      podLogDirPrefix: /var/log/pods
      typePodFields:
        logconfig: "${_k8s.logconfig}"
        namespace: "${_k8s.pod.namespace}"
        nodename: "${_k8s.node.name}"
        nodeip: "${_k8s.node.ip}"
        podname: "${_k8s.pod.name}"
        podid: "${_k8s.pod.uid}"
        containername: "${_k8s.pod.container.name}"
        containerimage: "${_k8s.pod.container.image}"
        workloadkind: "${_k8s.workload.kind}"
        workloadname: "${_k8s.workload.name}"
        cluster: "k8s-test"
  1. Deploy the image with nginx and publish some static content, while using a mask for the log:
    log_format main
        '$remote_addr - $remote_user [$time_local] "$request" $status'
        ' $body_bytes_sent "$http_referer" "$http_user_agent"'
        ' "$http_x_forwarded_for"';
  1. When making requests to this container, collect a log from it using transformer