grafana/k8s-monitoring-helm

Parse Json Logs?

tcborg opened this issue · 3 comments

Hey folks,

I have a nodejs backend producing json logs with pino. But when I check the logs in grafana using the loki datasource, the json is split on multiple lines.

Tried a lot of different combinations of the json stage, but could make it work:

This is my last attempt
stage.json { expressions = { output="log"} }

When I check the pod logs, I see this json, and I wanted the req piece to be whole on the grafana entry

req: {
      "id": 503,
      "method": "GET",
      "url": "/api/v3/home-feed?currentDateTime=06-09-2024-13&includeVideos=1&limit=35&teamId=CAN&trendingLimit=3",
      "query": {
        "currentDateTime": "06-09-2024-13",
        "includeVideos": "1",
        "limit": "35",
        "teamId": "CAN",
        "trendingLimit": "3"
      },
      "params": {},
      "headers": {
        "x-forwarded-for": "174.231.57.83",
        "x-forwarded-proto": "https",
        "x-forwarded-port": "443",
        "host": "api.mysite.com",
        "devicetype": "ios",
        "buildnumber": "3.02",
        "teamid": "CAN",
        "accept": "*/*",
        "authsource": "ios",
        "accept-language": "en-US;q=1.0",
        "includevideos": "true",
     
      "remoteAddress": "::ffff:10.133.11.205",
      "remotePort": 36070
    }

@tcborg can you provide a complete raw log line from the pod logs?

@bentonam

This is pretty much it

[11-06-2024 02:57:50] INFO (1): request completed
    req: {
      "id": 19256,
      "method": "GET",
      "url": "/api/v3/feed-by-id?contentType=video&currentDateTime=06-10-2024-22&feedId=zO3KRoos",
      "query": {
        "contentType": "video",
        "currentDateTime": "06-10-2024-22",
        "feedId": "zO3KRoos"
      },
      "params": {},
      "headers": {
        "x-forwarded-for": "70.17.106.134",
        "x-forwarded-proto": "https",
        "x-forwarded-port": "443",
        "host": "api.mysite.com",
        "x-amzn-trace-id": "Root=1-6667bd2e-53f0a50e72c3454803129de6",
        "devicetype": "ios",
        "buildnumber": "3.02",
        "authorization": "[Redacted]",
        "accept": "*/*",
        "authsource": "ios",
        "userid": "HpqsEKs2uvV7RFt395zpJHCTqH62",
        "contenttype": "video",
        "accept-language": "en-US;q=1.0",
        "accept-encoding": "br;q=1.0, gzip;q=0.9, deflate;q=0.8",
        "anonymoususerid": "e2b4rBuiVUauna6mURDPqT",
        "user-agent": "iOS",
        "feedid": "zO3KRoos",
        "cookie": "<redacted>"
      },
      "remoteAddress": "::ffff:10.133.31.185",
      "remotePort": 63618
    }
    res: {
      "statusCode": 200,
      "headers": {
        "x-powered-by": "Express",
        "access-control-allow-origin": "*",
        "content-type": "application/json; charset=utf-8",
        "content-length": "1541",
        "etag": "W/\"605-4irVFR88+TK3HnJKbrREhgDSyLU\"",
        "cache-control": "max-age=300"
      }
    }
    responseTime: 6

@tcborg I'm not that familiar with pino, but according to their documentation https://github.com/pinojs/pino/blob/main/docs/pretty.md pino uses NDJSON (new-line delimited JSON). You posted what I believe to be pretty-json, which spans multiple new lines. Typically new lines should be avoided in production logs if at all possible. As it leads to unnecessarily complex log parsing. However Alloy/Promtail supports multiline, you have to provide a firstline selector.

If my understanding is correct you want to:

  1. Only have the json body written to the loki
  2. There is still other useful information written to the log i.e. timestamp, level, message, responseTime
  3. The timestamp of the log is reset to the extracted value instead of the collection time
  4. level is used as a label
  5. message and responseTime values are extracted and set as structured metadata, this way they are still part of the log but not part of the log line and are not good candidates
  6. The log line is reset to be just the req json with all new lines removed.

Here is an example promtail configuration file, the reason I'm using promtail is you can read from stdin:

Promtail Config:

---
clients:
  - url: https://loki.example.com/loki/api/v1/push
    basic_auth:
      username: test
      password: test

scrape_configs:
  - job_name: pino
    static_configs:
      - targets:
          - localhost
        labels:
          __path__: pino.log
    pipeline_stages:
      # strip ascii color codes
      - decolorize:
      # RE2 regular expression, if matched will start a new multiline block.
      - multiline:
          firstline: '^\[\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2}\]\s+\w+(?: \(\d+\))?:'
          max_wait_time: 3s
      # assign a default log-level of unknown
      - static_labels:
          level: unknown
      # extract the timestamp, level, req and responseTime
      - regex:
          expression: '\[(?P<timestamp>\d{2}-\d{2}-\d{4} \d{2}:\d{2}:\d{2})\] (?P<level>\w+)(?: \(\d+\))?:\s+(?P<message>(?:\n|.)+)req:\n*\s*(?P<json_log>(?:.|\n)+)\n*\s*(?P<responseTime>responseTime:\s+\d+)\n*$'
      # set the extracted level to be a label
      - labels:
          level:
      # set the message and responseTime as structured metadata
      - structured_metadata:
          message:
          responseTime:
      # set the metadata timestamp to be that of the extracted timestamp from the log line
      - timestamp:
          source: timestamp
          format: "02-01-2006 15:04:05"
      # remove all new lines and whitespace from the req
      - replace:
          expression: '(\n\s+)'
          source: json_log
          replace: ''
      # reset the log line to be that of just the json object
      - output:
          source: json_log

Command:

cat pino.log | promtail -stdin -config.expand-env -config.file promtail.yaml -dry-run

Output:

image

Example Input:

[11-06-2024 02:57:50] INFO (1): request completed
    req: {
      "id": 19256,
      "method": "GET",
      "url": "/api/v3/feed-by-id?contentType=video&currentDateTime=06-10-2024-22&feedId=zO3KRoos",
      "query": {
        "contentType": "video",
        "currentDateTime": "06-10-2024-22",
        "feedId": "zO3KRoos"
      },
      "params": {},
      "headers": {
        "x-forwarded-for": "70.17.106.134",
        "x-forwarded-proto": "https",
        "x-forwarded-port": "443",
        "host": "api.mysite.com",
        "x-amzn-trace-id": "Root=1-6667bd2e-53f0a50e72c3454803129de6",
        "devicetype": "ios",
        "buildnumber": "3.02",
        "authorization": "[Redacted]",
        "accept": "*/*",
        "authsource": "ios",
        "userid": "HpqsEKs2uvV7RFt395zpJHCTqH62",
        "contenttype": "video",
        "accept-language": "en-US;q=1.0",
        "accept-encoding": "br;q=1.0, gzip;q=0.9, deflate;q=0.8",
        "anonymoususerid": "e2b4rBuiVUauna6mURDPqT",
        "user-agent": "iOS",
        "feedid": "zO3KRoos",
        "cookie": "<redacted>"
      },
      "remoteAddress": "::ffff:10.133.31.185",
      "remotePort": 63618
    }
    res: {
      "statusCode": 200,
      "headers": {
        "x-powered-by": "Express",
        "access-control-allow-origin": "*",
        "content-type": "application/json; charset=utf-8",
        "content-length": "1541",
        "etag": "W/\"605-4irVFR88+TK3HnJKbrREhgDSyLU\"",
        "cache-control": "max-age=300"
      }
    }
    responseTime: 6

Example Promtail Output

{"id": 19256,"method": "GET","url": "/api/v3/feed-by-id?contentType=video&currentDateTime=06-10-2024-22&feedId=zO3KRoos","query": {"contentType": "video","currentDateTime": "06-10-2024-22","feedId": "zO3KRoos"},"params": {},"headers": {"x-forwarded-for": "70.17.106.134","x-forwarded-proto": "https","x-forwarded-port": "443","host": "api.mysite.com","x-amzn-trace-id": "Root=1-6667bd2e-53f0a50e72c3454803129de6","devicetype": "ios","buildnumber": "3.02","authorization": "[Redacted]","accept": "*/*","authsource": "ios","userid": "HpqsEKs2uvV7RFt395zpJHCTqH62","contenttype": "video","accept-language": "en-US;q=1.0","accept-encoding": "br;q=1.0, gzip;q=0.9, deflate;q=0.8","anonymoususerid": "e2b4rBuiVUauna6mURDPqT","user-agent": "iOS","feedid": "zO3KRoos","cookie": "<redacted>"},"remoteAddress": "::ffff:10.133.31.185","remotePort": 63618}res: {"statusCode": 200,"headers": {"x-powered-by": "Express","access-control-allow-origin": "*","content-type": "application/json; charset=utf-8","content-length": "1541","etag": "W/\"605-4irVFR88+TK3HnJKbrREhgDSyLU\"","cache-control": "max-age=300"}}

As you can see this works when accounting for multiline log messages. Now promtail will not help you with the k8s-monitoring helm chart as it is not supported, Alloy is used. Alloy provides a conversion command, if we convert the promtail file to alloy, the steps needed will be provided for you.

Alloy Convert:

alloy convert --output pino.alloy --source-format promtail promtail.yaml
local.file_match "pino" {
	path_targets = [{
		__address__ = "localhost",
		__path__    = "pino.log",
	}]
}

loki.process "pino" {
	forward_to = [loki.write.default.receiver]

	stage.decolorize { }

	stage.multiline {
		firstline = "^\\[\\d{2}-\\d{2}-\\d{4} \\d{2}:\\d{2}:\\d{2}\\]\\s+\\w+(?: \\(\\d+\\))?:"
		max_lines = 0
	}

	stage.static_labels {
		values = {
			level = "unknown",
		}
	}

	stage.regex {
		expression = "\\[(?P<timestamp>\\d{2}-\\d{2}-\\d{4} \\d{2}:\\d{2}:\\d{2})\\] (?P<level>\\w+)(?: \\(\\d+\\))?:\\s+(?P<message>(?:\\n|.)+)req:\\n*\\s*(?P<json_log>(?:.|\\n)+)\\n*\\s*(?P<responseTime>responseTime:\\s+\\d+)\\n*$"
	}

	stage.labels {
		values = {
			level = null,
		}
	}

	stage.structured_metadata {
		values = {
			message      = null,
			responseTime = null,
		}
	}

	stage.timestamp {
		source = "timestamp"
		format = "02-01-2006 15:04:05"
	}

	stage.replace {
		expression = "(\\n\\s+)"
		source     = "json_log"
	}

	stage.output {
		source = "json_log"
	}
}

loki.source.file "pino" {
	targets               = local.file_match.pino.targets
	forward_to            = [loki.process.pino.receiver]
	legacy_positions_file = "/var/log/positions.yaml"
}

loki.write "default" {
	endpoint {
		url = "https://loki.example.com/loki/api/v1/push"

		basic_auth {
			username = "test"
			password = "test"
		}
	}
	external_labels = {}
}

I hope this is helpful for you and helps to solve your issues.

Download Files: pino-example.zip