Produce enriched errors
jodh-intel opened this issue · 5 comments
Improve the generated error objects.
This is proving to be difficult: not the new API code, but how to use it.
Rationale for structured errors
-
They make it easy to establish where an error was created (or first detected).
-
They include log meta-data at the point where they are generated/detected to provide
further debug information. -
They allow the code to be simplified
Rather than making a log call and then returning an error, a single operation
can be used to get the benefit of both operations (consistently). -
They are machine parsable.
The API
A slightly modified version of the simple API shown on #657...
package e
func Log(entry *logrus.Entry) error
... works fine:
Description | current call | new call |
---|---|---|
Handle existing error | return err |
return e.Log(virtLog.WithError(err)) |
Create static error | return errrors.New(...) |
return e.Log(virtLog.WithField("error", "...")) |
Create formatted error | return fmt.Errorf(...) |
return e.Log(virtLog.WithField("error", fmt.Sprintf(...)) |
That's a little verbose, but bearable (I think).
The complication
The problem is now more about where the calls to e.Log()
are placed. There are
hundreds of code locations that return errors, but it doesn't make sense to change all
of them to use e.Log()
: if we do that, we'll end up with errors containing multiple sets
of fields. The code handles this fine, but we'll essentially get "too much" detail when
all we actually want is the full stacktrace for the location where the error was first
detected.
In summary:
-
If we generate the structured errors only at the highest level, the additional fields
(stacktrace and logger fields) won't be very useful. -
If we generate the structured errors at all levels (the "search'n'replace" approach to
addinge.Log()
calls), we'll end up with overly "large" errors (errors with duplicate
fields). Although the code supports creating a structured error from an existing
structured error, it results in larger and larger error objects which is going to make
the returned error less clear. It could also have a performance impact.
The solution therefore would seem to be to only create structured errors at the lowest
possible layers:
-
In pseudo-leaf functions
Functions that do not call any functions in "lower" virtcontainers packages).
-
In the
pkg/*
packagesProblem: these don't have access to a logger, so the next-best approach is...
-
At package boundaries
In functions where virtcontainers calls code from other packages.
-
At plugin boundaries (see #667).
Layers
virtcontainers is structured at the file level, something like the following:
Level | files |
---|---|
high | api.go , pkg/oci.go |
pod.go , container.go |
|
hypervisor.go , agent.go , shim.go , proxy.go |
|
medium | qemu*.go , agent_*.go , shim_*.go , proxy_*.go |
asset.go , bridge.go , cni.go , cnm.go , device.go , filesystem.go , hook.go , mount.go , network.go , syscall.go , utils.go |
|
low | pkg/*/*.go (excluding pkg/oci.go ) |
Application approaches
Although the code can be updated manually to introduce the structured error API call:
-
Adding the API to a subset of code locations (a subset of functions in a subset of
files) is going to be error-prone. -
It will also be error-prone to update the API calls when code when changes are required.
The best approach might be to find/write a tool to identify such code locations and error
if there is no call to e.Log()
.
What might help is if we split the main
package into sub-packages as suggested on #631.
See also: #667.
Any thoughts on this folks?
@jodh-intel could you copy this issue over to github.com/kata-containers/runtime so that it can be discussed with everybody ?
Done. I guess we could close this one now?
Yes let's close it !