Produce enriched errors

Question

Produce enriched errors

jodh-intel opened this issue 7 years ago · 5 comments

jodh-intel commented 7 years ago

Improve the generated error objects.

Answer 1 · 2018-03-13T11:34:48.000Z

This is proving to be difficult: not the new API code, but how to use it.

Rationale for structured errors

They make it easy to establish where an error was created (or first detected).
They include log meta-data at the point where they are generated/detected to provide
further debug information.
They allow the code to be simplified

Rather than making a log call and then returning an error, a single operation
can be used to get the benefit of both operations (consistently).
They are machine parsable.

The API

A slightly modified version of the simple API shown on #657...

package e

func Log(entry *logrus.Entry) error

... works fine:

Description	current call	new call
Handle existing error	`return err`	`return e.Log(virtLog.WithError(err))`
Create static error	`return errrors.New(...)`	`return e.Log(virtLog.WithField("error", "..."))`
Create formatted error	`return fmt.Errorf(...)`	`return e.Log(virtLog.WithField("error", fmt.Sprintf(...))`

That's a little verbose, but bearable (I think).

The complication

The problem is now more about where the calls to e.Log() are placed. There are
hundreds of code locations that return errors, but it doesn't make sense to change all
of them to use e.Log(): if we do that, we'll end up with errors containing multiple sets
of fields. The code handles this fine, but we'll essentially get "too much" detail when
all we actually want is the full stacktrace for the location where the error was first
detected.

In summary:

If we generate the structured errors only at the highest level, the additional fields
(stacktrace and logger fields) won't be very useful.
If we generate the structured errors at all levels (the "search'n'replace" approach to
adding e.Log() calls), we'll end up with overly "large" errors (errors with duplicate
fields). Although the code supports creating a structured error from an existing
structured error, it results in larger and larger error objects which is going to make
the returned error less clear. It could also have a performance impact.

The solution therefore would seem to be to only create structured errors at the lowest
possible layers:

In pseudo-leaf functions

Functions that do not call any functions in "lower" virtcontainers packages).
In the pkg/* packages

Problem: these don't have access to a logger, so the next-best approach is...
At package boundaries

In functions where virtcontainers calls code from other packages.
At plugin boundaries (see #667).

Layers

virtcontainers is structured at the file level, something like the following:

Level	files
high	`api.go`, `pkg/oci.go`
	`pod.go`, `container.go`
	`hypervisor.go`, `agent.go`, `shim.go`, `proxy.go`
medium	`qemu.go`, `agent_.go`, `shim_.go`, `proxy_.go`
	`asset.go`, `bridge.go`, `cni.go`, `cnm.go`, `device.go`, `filesystem.go`, `hook.go`, `mount.go`, `network.go`, `syscall.go`, `utils.go`
low	`pkg//.go` (excluding `pkg/oci.go`)

Application approaches

Although the code can be updated manually to introduce the structured error API call:

Adding the API to a subset of code locations (a subset of functions in a subset of
files) is going to be error-prone.
It will also be error-prone to update the API calls when code when changes are required.

The best approach might be to find/write a tool to identify such code locations and error
if there is no call to e.Log().

What might help is if we split the main package into sub-packages as suggested on #631.