/logging

thoughts on logging (slides use revealjs)

Primary LanguageGo

theme highlightTheme
night
monokai

Logging


Where do logs fit?

Pillars of Observability

  • Metric (coarse-grain)
  • Log (🤷)
  • Trace (fine-grain)
Reference:

--

Reality Check: "pillars" are a marketing scam

It's just emitting data to a tool with a variety of precision

The tool often limits the precision you can include

--

Reality Check: metrics

Metrics are low-fidelity event aggregates

Tells you IF failure; but not WHY failure

--

Reality Check: tracing

Tracing is logging with opinion + tooling

You can DIY or SaaS:

  • AWS X-Ray
  • DataDog APM
  • honeycomb.io

Basics

  1. context
  2. correlation
  3. level

Context

You cannot predict future questions
— be generous

Logs impose no limits on size* / cardinality

... consider data sensitivity; probably don't dump blobs

--

Common Mistakes

--

message overloading

Task finished: CreateRoutes2: duration=3.014

  • hard to parse
  • slow to filter (using like operation)
  • ambiguous unit

--

Context examples

example: ./examples.md

--

Context in practice

good logs are a consequence of good code

--

Example — pipe architecture

queue service > validator service > sender service

Q: who should log?

new problem; how to pass context + correlation?

--

SOLID

controller > queue
controller > validator
controller > sender
controller > log

Q: who should log?

A: The controller

everything else performs a discrete function

--

Inversion of control

// get next
work := queue.Pop()
log = logger.WithField("work_id", item.ID)

// validate
if reason := s.validator.IsValid(work.Body); reason != nil {
    logger.WithField("reason", reason).Info("item invalid")
    return
}

// send
if err := s.sender.Send(work.Body); !err != nil {
    logger.WithError(err).Error("failed to send item")
    return
}

// commit work
work.Delete()

// emit
logger.Info("done")

* assume error handling!

--

Single-responsibility principle

func Sender (i Item) error {

    // serialize
    body, err := json.Marshal(i)
    if err != nil {
        return errors.Wrap(err, "serialization failure")
    }

    // send
    resp, err := http.Post(url, "application/json", bytes.NewBuffer(body))
    if err != nil {
        return errors.Wrap(err, "http failure")
    }
    defer res.Body.Close()

    // check response
    if response.StatusCode != http.StatusOK {
        return fmt.Errorf("backend failure: %s", response.Status)
    }

    return nil
}

dont log AND throw (that's 2 things)

--

Context in practice

Code falls into 2 flavours:

  1. Controller
    level=INFO
  2. Everything else
    level=DEBUG

--

Observer pattern

example: ./go/service/main.go

treat logs as an event emitting dependancy


Correlation

... a cross-component concern

👉 Concensus 👈

correlation_id / request_id / user_id / asset_id / ...

--

Correlation

reaching consensus through tooling

package appcontext

type RequestContext struct { ... }

type ClientContext struct { ... }

type SystemContext struct {
	Application string `json:"application,omitempty"`
	Version     string `json:"version,omitempty"`
	Environment string `json:"environment,omitempty"`
}

func WithSystemContext(ctx context.Context, val SystemContext) context.Context {
	return context.WithValue(ctx, key, val)
}

func GetSystemContext(ctx context.Context) (val SystemContext, ok bool) {
	val, ok = ctx.Value(key).(SystemContext)
	return
}

...

Levels

A broad category which is important to collectively agree on.

--

Common Mistakes

--

non-ERROR

ERROR: client is not authorized

This belongs in the response to the client:
401 Unauthorized

(or maybe an "access log")

--

non-INFO

Uninteresting plumbing

INFO: executed 'SELECT * FROM foo'

INFO: parsed JSON

aka. i was prototyping and accidentally committed it

--

predictions

Predicting the future

INFO: about to handle request

aka. i don't trust my language to trap exceptions

--

Level Definitions

--

fatal

The system cannot continue

FATAL: failed to connect to database

--

error

A transient problem during processing

ERROR: timeout while saving

--

warning

Processing degraded but can continue

WARN: config unset; using default

opinion: use INFO

--

info

System did what you asked it to do

INFO: done

INFO: batch complete

INFO: cache refreshed

--

debug

Low-level supporting steps.

Usually disabled due to poor signal-to-noise ratio.

Danger zone: Take care with sensitive data!


Closing

you'll get it wrong the first time; iterate