kubernetes/kubernetes

Use of glog for logging is problematic

obeattie opened this issue ยท 23 comments

FEATURE REQUEST:

/kind feature

What happened:

Kubernetes uses golang/glog extensively for logging. While this is reasonable enough for the Kubernetes binaries, sizeable parts of the codebase are re-used as client libraries (eg. k8s.io/client-go). This presents several problems for users of the libraries:

  • glog registers a bunch of flags in an init function, and offers no programmatic way to configure its behaviour. It can be surprising to users of the k8s libraries that they have to call flag.Parse(), and this is easy to get wrong (eg. coredns/coredns#1597).
  • glog's default behaviour is to log to a file on disk. A user of a library wouldn't typically expect it to write files without explicitly configuring it to do so.
  • Worse, if glog cannot create a file, it calls os.Exit. This can be very harmful, and since it's common to run containerised binaries with a read-only root filesystem, can be easy to trigger.
  • glog doesn't perform any management of the files it writes, so without something like logrotate running (especially problematic in a container) the log files just accumulate.

I think we need to find a way to fix this: the use of glog makes working with the k8s client libraries very difficult. A recent Twitter discussion on this topic shows this is a widely-held concern.

@kubernetes/sig-architecture-feature-requests

@obeattie: Reiterating the mentions to trigger a notification:
@kubernetes/sig-architecture-feature-requests

In response to this:

@kubernetes/sig-architecture-feature-requests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Personal 2ยข: while replacing glog completely with something else might be the best long-term solution, doing so seems like it would be a long road. Coming to a consensus on a replacement and rewriting all existing calls to glog to use that replacement doesn't seem like it will happen quickly.

The problems caused by this are quite harmful now, so I think it's pragmatic to mitigate those without replacing glog completely. It seems like that would involve something like:

  1. Forking glog
  2. Making it programmatically configurable
  3. Making it log to stdout/stderr by default
  4. Removing its flags, adding those as explicit k8s flags instead, and using them to configure the library

I'd be happy to take a go at this, but it's probably only worthwhile doing that if there's enough agreement that it would be an acceptable thing to do (a PR that changes the glog import path would benefit from merging pretty fast since it would get out of date very quickly.)

@obeattie thanks for enumerating these issue in full, I agree with all of them and we should start doing something about it. I think most developers are partly aware of some of these issues, but I am not sure if anyone else has enumerated these issues as well as you did here.

I agree glog should be fixed, forked or replaced.
Note the glog README says "Feature requests will be ignored.", which points towards the latter two.

Interested in your view @thockin

I was part of the initial Twitter thread - my main concerns are: the init() does too much and cannot be deferred.

A forked/updated glog should ideally not run init() or have an environmental variable that can be set to avoid it running init() until it's desirable - for instance how can you unit test this (you can't)?

The override should ideally not mandate the use of args/flags to configure logging.

To the point @thockin raised by his Go logging interface project (mentioned in the tweet thread) - there should be an interface available so that the underlying test framework/implementation can be abstracted away from the consuming code.

https://github.com/thockin/logr

Related issue: coredns/coredns#1597 (closed)

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Pinging some people who we've spoken to about this - @bboreham @munnerz - @thockin had some ideas in the hallway at KubeCon CPH about how to approach this as a piecemeal solution gradually replacing glog with an interface implemented by glog for a non-harm change.

Creating a more complete list of requirements and desirable properties would be useful.

2 more problems with glog I'd like to add:

  1. Logs are untestable (by unit tests). There is no good way to get at the log file, or intercept logs.
  2. No way to direct logs to a specific file. As a result, a lot of our system addons log to stderr, and then use a shell to redirect the logs to a file. This is problematic for a whole bunch of reasons.

Forking glog

+1. I think a sensible path forward would be:

  1. Fork glog, address the immediate problems without changing the interface
  2. (simultaneously) collect requirements & review existing log libraries to identify the library we want to migrate to
  3. Reimplement our glog fork as an adapter to the new logging library
  4. Gradually migrate log usage to the new library interface (or an alternative interface we design).
dims commented

long-term-issue (note to self)

dims commented

@tallclair @alexellis @bboreham @munnerz @thockin @bgrant0607 Here's an experiment for forking glog into the k/k repo - #69333

I'd love to see us move to something like https://github.com/go-logr/logr (planned to rename to lager, maybe?) and I am willing to help strategize how.

Starting at leaf packages and working backwards, we should be able to replace the glog calls with methods on the abstract interface implemented in terms of glog. This forces us to think through how to plumb it around into libraries and across interfaces. As we get closer to main(), we can talk about switching the implementation to some other log lib, and callers can break their dependencies on literal glog.

I guess there are 2 different approaches here with the same end goal:

  1. (the approach in this PR) Adopt a new logging backend (library), and then gradually migrate logging calls to the new interface. By forking glog, it means we could flip the fork over to keep it's interface and flip the implementation over to the new library.
  2. (@thockin's proposal) Gradually migrate logging calls to a new interface (still backed by glog), and then flip over the backend.

Some of the problems with our current logging approach are addressed with a new interface, while others are better addressed with a new backend. I guess which approach we take depends somewhat on which set of problems we're prioritizing.

wking commented

I'd love to see us move to something like https://github.com/go-logr/logr (planned to rename to lager, maybe?) and I am willing to help strategize how.

Also in this space is go-log/log, which aims for the simpler printf-style approach from Dave's blog. I'm definitely in favor of having a pluggable interface of some sort for this to enable client-go (and other library) consumers to plug in their own implementation, which may not be glog or a fork of glog).

dims commented

FYI, main k/k repo is now using k8s.io/klog

dcbw commented

klog PR is #70889

dims commented

Please see https://groups.google.com/d/msg/kubernetes-dev/7vnijOMhLS0/1oRiNtigBgAJ

I'd love to see follow ups on what can be done better now that we have klog

/close

@dims: Closing this issue.

In response to this:

Please see https://groups.google.com/d/msg/kubernetes-dev/7vnijOMhLS0/1oRiNtigBgAJ

I'd love to see follow ups on what can be done better now that we have klog

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.