containers/virtcontainers

Produce enriched errors

jodh-intel opened this issue · 5 comments

Improve the generated error objects.

This is proving to be difficult: not the new API code, but how to use it.

Rationale for structured errors

  • They make it easy to establish where an error was created (or first detected).

  • They include log meta-data at the point where they are generated/detected to provide
    further debug information.

  • They allow the code to be simplified

    Rather than making a log call and then returning an error, a single operation
    can be used to get the benefit of both operations (consistently).

  • They are machine parsable.

The API

A slightly modified version of the simple API shown on #657...

package e

func Log(entry *logrus.Entry) error

... works fine:

Description current call new call
Handle existing error return err return e.Log(virtLog.WithError(err))
Create static error return errrors.New(...) return e.Log(virtLog.WithField("error", "..."))
Create formatted error return fmt.Errorf(...) return e.Log(virtLog.WithField("error", fmt.Sprintf(...))

That's a little verbose, but bearable (I think).

The complication

The problem is now more about where the calls to e.Log() are placed. There are
hundreds of code locations that return errors, but it doesn't make sense to change all
of them to use e.Log(): if we do that, we'll end up with errors containing multiple sets
of fields. The code handles this fine, but we'll essentially get "too much" detail when
all we actually want is the full stacktrace for the location where the error was first
detected.

In summary:

  • If we generate the structured errors only at the highest level, the additional fields
    (stacktrace and logger fields) won't be very useful.

  • If we generate the structured errors at all levels (the "search'n'replace" approach to
    adding e.Log() calls), we'll end up with overly "large" errors (errors with duplicate
    fields). Although the code supports creating a structured error from an existing
    structured error, it results in larger and larger error objects which is going to make
    the returned error less clear. It could also have a performance impact.

The solution therefore would seem to be to only create structured errors at the lowest
possible layers:

  • In pseudo-leaf functions

    Functions that do not call any functions in "lower" virtcontainers packages).

  • In the pkg/* packages

    Problem: these don't have access to a logger, so the next-best approach is...

  • At package boundaries

    In functions where virtcontainers calls code from other packages.

  • At plugin boundaries (see #667).

Layers

virtcontainers is structured at the file level, something like the following:

Level files
high api.go, pkg/oci.go
pod.go, container.go
hypervisor.go, agent.go, shim.go, proxy.go
medium qemu*.go, agent_*.go, shim_*.go, proxy_*.go
asset.go, bridge.go, cni.go, cnm.go, device.go, filesystem.go, hook.go, mount.go, network.go, syscall.go, utils.go
low pkg/*/*.go (excluding pkg/oci.go)

Application approaches

Although the code can be updated manually to introduce the structured error API call:

  • Adding the API to a subset of code locations (a subset of functions in a subset of
    files) is going to be error-prone.

  • It will also be error-prone to update the API calls when code when changes are required.

The best approach might be to find/write a tool to identify such code locations and error
if there is no call to e.Log().

What might help is if we split the main package into sub-packages as suggested on #631.

See also: #667.

Any thoughts on this folks?

@jodh-intel could you copy this issue over to github.com/kata-containers/runtime so that it can be discussed with everybody ?

Done. I guess we could close this one now?

Yes let's close it !