postgresml/pgcat

Structured logging (JSON)

Closed this issue · 4 comments

jbub commented

Is your feature request related to a problem? Please describe.
Its pretty common these days to emit structured logs instead of textual logs. Its easier to parse and some cloud environments already has support for parsing and indexing them. My problem is mainly with GKE, textual logs to stderr/stdout are by default logged with severity of ERROR which is not really great as that brings a lot of noise.

Describe the solution you'd like
Ability to switch to json logging, a severity field would decide the level of the logged message. Example logged message:

{
  "severity":"INFO",
  "message":"Hello world.",
  "time":"2020-10-12T07:20:50.52Z"
}

https://cloud.google.com/logging/docs/structured-logging

@levkk do we need to keep using the env_logger? This is a good use case for the tracing-subscriber. Like the examples below, we could display logs using the existing formatters. Both text and pretty formatters have color support.

Let me know your thoughts so I can send the PR.

text => current values - using full formatter

2023-07-19T05:06:43.430838Z  INFO pgcat: Welcome to PgCat! Meow. (Version 1.0.2-alpha3)    
2023-07-19T05:06:43.447918Z  INFO pgcat: Running on 0.0.0.0:6432    
2023-07-19T05:06:43.447940Z  INFO pgcat::config: Ban time: 60s    
2023-07-19T05:06:43.447945Z  INFO pgcat::config: Idle client in transaction timeout: 0ms    
2023-07-19T05:06:43.447950Z  INFO pgcat::config: Worker threads: 5   

We could use a custom format to implement the current format if needed.

structured => using the json formatter

{"timestamp":"2023-07-19T05:08:43.827969Z","level":"INFO","fields":{"message":"Welcome to PgCat! Meow. (Version 1.0.2-alpha3)","log.target":"pgcat","log.module_path":"pgcat","log.file":"src/main.rs","log.line":88},"target":"pgcat"}
{"timestamp":"2023-07-19T05:08:43.842202Z","level":"INFO","fields":{"message":"Running on 0.0.0.0:6432","log.target":"pgcat","log.module_path":"pgcat","log.file":"src/main.rs","log.line":149},"target":"pgcat"}
{"timestamp":"2023-07-19T05:08:43.842238Z","level":"INFO","fields":{"message":"Ban time: 60s","log.target":"pgcat::config","log.module_path":"pgcat::config","log.file":"src/config.rs","log.line":987},"target":"pgcat::config"}
{"timestamp":"2023-07-19T05:08:43.842252Z","level":"INFO","fields":{"message":"Idle client in transaction timeout: 0ms","log.target":"pgcat::config","log.module_path":"pgcat::config","log.file":"src/config.rs","log.line":988},"target":"pgcat::config"}
{"timestamp":"2023-07-19T05:08:43.842266Z","level":"INFO","fields":{"message":"Worker threads: 5","log.target":"pgcat::config","log.module_path":"pgcat::config","log.file":"src/config.rs","log.line":992},"target":"pgcat::config"}

debug => using the pretty formatter

  2023-07-19T05:09:59.427259Z  INFO pgcat: Welcome to PgCat! Meow. (Version 1.0.2-alpha3)
    at src/main.rs:88

  2023-07-19T05:09:59.441273Z  INFO pgcat: Running on 0.0.0.0:6432
    at src/main.rs:149

  2023-07-19T05:09:59.441295Z  INFO pgcat::config: Ban time: 60s
    at src/config.rs:987

  2023-07-19T05:09:59.441302Z  INFO pgcat::config: Idle client in transaction timeout: 0ms
    at src/config.rs:988

  2023-07-19T05:09:59.441308Z  INFO pgcat::config: Worker threads: 5
    at src/config.rs:992

our current log format (for comparison)

[2023-07-19T05:18:51.976008Z INFO  pgcat] Welcome to PgCat! Meow. (Version 1.0.2-alpha3)
[2023-07-19T05:18:51.990326Z INFO  pgcat] Running on 0.0.0.0:6432
[2023-07-19T05:18:51.990348Z INFO  pgcat::config] Ban time: 60s
[2023-07-19T05:18:51.990353Z INFO  pgcat::config] Idle client in transaction timeout: 0ms
[2023-07-19T05:18:51.990357Z INFO  pgcat::config] Worker threads: 5

initial PR is here: #517

This is pretty awesome, although I'm quite a bit annoyed because of it as well. Fun story, this change as it rolled out broke my production infra!

So. Associated here is the warning-level message in pgcat when it's discarding a session to the pool due to "SET" being called, along with a few other things that are "warning" level logs which happen... a lot.

So. If it's fex. a PHP based backend, which always calls "SET" to make sure fex. the output for BYTEA format in postgres is canonical, and theres.. say, a few million calls to those functions every few minutes, we get a lot of log lines. Almost all identical, for a behaviour we don't really care much about. ( Discard is cheaper than a tcp+tls connection, I'm cool with that.)

But, here the log filtering from RUST_LOG variable changed, so the discard filters to ignore such lines are now logged. And well, that ate my log quota.

And that's the story of why logging is bad you should all go back to spooling logs to an unbuffered line printer instead.

( ps. how the bleep do I filter out specific spammy warning-level messages now? )

levkk commented

Tell me about it ..I go through my Sentry quota in two days and just sit there wondering if I should give them more money or just run my own error reporting tool.