Support structured logging
Closed this issue · 8 comments
Is your feature request related to a problem? Please describe.
Currently every information is dumped as a single line of text, using fmt.Sprintf
to embed information and variable. This is hard to parse by third party application, such as Grafana/Loki.
Describe the solution you'd like
Utilize a standard logfmt that is easy to read and parse. This is already supported by slog
. Even better if it can be configured to use json in case somebody wants that. ( slog
has that option too. )
Describe alternatives you've considered
slog
seems to be a straight choice, it's already used and supports this feature. Only thing needed to be changed is the approach of the logs.
Additional context
Would be nice to provide some additional context if possible. Such as when using the foreground
feature, it could add the id
of the cron job to the context by default.
This sounds a bit as if you are trying to recreate something that is already there. During each backup run, statistics are collected and stored in a Stats
object:
docker-volume-backup/cmd/backup/stats.go
Lines 43 to 55 in 6029225
After being populated, this is currently only made available to notification templates: https://offen.github.io/docker-volume-backup/how-tos/set-up-notifications.html#notification-templates-reference
I'm not sure if it makes sense to start emitting metrics in a different manner now and also keep supporting both. I also have to admit that I like log lines that are written for humans mostly, e.g. as it helps with getting users up to speed when the log message tells you what's going wrong. If you're looking to get such data into Loki or other log aggregation systems, I would propose to add an option that will dump a serialized form (JSON or logfmt) of the Stats
object to stdout after the script has run. This could then be ingested by everything that reads logs.
Also related is the idea of making this available for Prometheus: #64
I think it's a bit easier if I detail the setup that I have running. For the ease of monitoring my home stack and especially finding some bugs, I have Grafana and Loki running.
In the stack there are 4 components:
- The service being monitored. These are all running in docker containers, outputting their logs to the stdout. Which is then exposed from the docker daemon.
- Promtail, which is a monitoring agent that will discover the containers and collect the logs from them. ( This is done via docker labels. )
- Loki, who will collect all the logs coming from different services and expose them to Grafana for visualization.
- Grafana, the dashboard that can be used to display this information. For the ease of use this can be done via labels.
{container="grafana"} |= `` | logfmt | level = debug
The output of the above filter would be something like:
logger=ngalert.sender.router t=2024-02-02T19:13:51.691945136Z level=debug msg="Finish of admin configuration sync"
In this setup, it is a much easier to have the logs just being fed into Loki automatically without any need for configuration. I would still keep the logs in the same human readable format, just add some labels and annotations to it. ( The current logs are actually already in this logfmt format, but everything is inside the "msg" tag.
The thing I am worried about in this approach is: each time the logger is called, there will be a decision to be made which information to include in a structured way and which one not. Obviously, this is also something that happens when writing logs that are meant for humans, but it's a bit more lenient. When people start consuming logs like this, I would expect requests like "oh I would like to know about X at position Y, can you add it?" to surface, and that's not really sustainable. With having the Stats
object, this is well defined and adding / changing properties has to be done only once instead of having to revisit every log call.
I have zero idea about how complicated it would be to make your setup pick up a single line instead all of them.
My question would therefore be: would it be possible / helpful for you to reuse the Stats
object here and append its current state to every log call if requested (i.e. enabled using a setting)? Would that give you what you need?
Also, when you start emitting logs in a structured way, you start creating an interface for other consumers, which means this will need to stay stable. I currently do not consider log output a stable interface and actually do change it from time to time. Stats
already has stability guarantees so that users can safely render their templates. This would also be a good reason to reuse what's already there.
I don't think such verbose information is really needed for the logging. I don't want to build metrics out of it, just create potential alerts with some details.
Currently the only thing that I would like to be changed is to output the error in its own tag instead of being embedded inside the msg. Additionally could be useful to assign a unique identifier to individual backup runs, in debug mode provide filename/linenumber if it makes sense.
I personally think there could be a leeway in the logs interface, I have not seen it being very strict in other applications either. ( They server as a secondary line after the metrics themselves. )
Ok, I guess adding the error when logging at ERROR level makes sense. From what I remember this is only happening in a single place anyways which is here:
docker-volume-backup/cmd/backup/main.go
Lines 23 to 49 in 6029225
I'd just really like to avoid duplicating data that is already available in Stats
in the logs.
There are a few more instances wheree it applies, but I do not want to introduce statistics or any complex data in the logging.
Yesterday while running the test suite I also found out that some data appears in the logs already. The storage context is present in the storage
label. So I would really just add these metadatas that can be detected automatically into the logs. What I think will be useful for the new scheduler is the id of the job that printed the given log.