generic metric hooks
Opened this issue ยท 17 comments
Over in slack I asked about how we might go about adding the ability to emit metrics out of go-spiffe in a generic way. We've implemented our own wrapper for go-spiffe that wraps the go-spiffe functions like GetClientCertificate
that end up in the tls.Config
so we can track success and failures from the perspective of each end point.
After chatting about it with the team here at GH we think a implementation similar to the ClientTrace
in net/http might be a reasonable solution. The basic idea is that there would be a struct users could configure with functions that would get executed at different points in the go-spiffe codebase. The example they use in the blog post is adding logging when a DNS response is received by the http client.
Some examples of metrics we think might be good points to execute hooks from:
getTLSCertificate
- we should have access to cert details here and can emit spiffe IDs, etc similar towrapGetCertificateWithMetrics
ParseAndVerify
- we should have access to cert details here and can emit spiffe IDs, etc similar towrapVerifyPeerCertificateWithMetrics
, probably want success/failure counts and timersVerify
- along the lines above โ๏ธ, probably want success/failure counts and timersAdaptMatcher
- emit metrics on authorization success and failure
There are likely many others that might make sense as well.
Anyway, I am curious what folks think and wanted to get some consensus before we dig into implementing anything.
I think that sounds like a reasonable approach! We have lots of different packages. Would each package define its own "ClientTrace"-style struct and associated options that allow you to set it? There will probably be a little design work to ensure consistency of application to the various components.
I'm assume you are just targeting the go-spiffe v2 implementation?
I also wonder if this is sufficient to supplant the logger, which has felt a little invasive and something a few of us debated quite a bit before including.
Would each package define its own "ClientTrace"-style struct and associated options that allow you to set it?
๐ I don' think we know yet but we'll dig in and find out.
I'm assume you are just targeting the go-spiffe v2 implementation?
Yes!
I also wonder if this is sufficient to supplant the logger, which has felt a little invasive and something a few of us debated quite a bit before including.
I think it certainly could replace it.
@azdagron once we agree on an approach over in #150 it's unclear what might happen next. I think we'd be onboard to submit PR(s) for the areas/packages we use since we have a pretty good idea of what we'd want to measure with the metrics but I'm not sure we have the production knowledge to implement hooks in all the right places for everything else. Frankly, I don't think that would cover very many (tlsconfig, spiffeid, parts of workloadapi, etc). A half-implementation is likely worse than no implementation. What do you think?
I'm sure once there is consensus that we can get somebody to work on it that is more familiar with the library. It should be a relatively scoped exercise and as long as we've done the plumbing of the trace structs correctly we don't need to worry about missing a start/end function here or there. Those should be able to be added later if needed.
I apologize. It's been a busy few weeks and this off my radar. I was talking to @evan2645 about this a few weeks back and we discussed prior art in other libraries. I've done some digging without success to see if there are established patterns outside of what is provided by httptrace
that we could align to. I haven't had much luck finding anything however. I think the current approach of providing a trace struct via options seems as good as any. It certainly gives us future proofing flexibility.
@amartinezfayo @evan2645 any concerns marching forward with the current approach?
any concerns marching forward with the current approach?
I don't personally have a specific concern with the current approach. I'm guessing that we may need to make tweaks here and there as people start using this, but hopefully without structural changes.
Yeah, my biggest concern is that we settle with a design that is 1) internally consistent within the library, and 2) does not require breaking changes to expand
I haven't spent enough time looking at this change to endorse it specifically, but so long as we've considered existing patterns etc it sounds fine to me :)
@joewilliams I think next steps is addressing the feedback on the PR and getting that in. I appreciate your patience seeing this through!
We're really close. I took a final pass and found some things that I think need addressing but after those I think we're good to squash and merge. I appreciate your patience with the back-and-forth!
@azdagron I think the next thing we'd want to implement this tracing pattern for are the verify related functions in tlsconfig. Does the align with what you are wanting?
Yeah. I started digging into a PR for that and it brought up some questions around ergonomics that I'm still debating internally...
This is still on my radar but I haven't had cycles to put more thought into it. Just wanted to make sure folks knew it hasn't been forgotten.
@joewilliams, I'm so sorry this has fallen by the wayside.
I think next steps is figuring out at what granularity we want on the verification related tracing hooks. Verification has two basic steps:
- verifying the cryptographic chain of trust from the peer certificate back to the bundle
- authorizing the peer certificate
Is there utility in tracing each of those events from a debuggability or telemetry perspective? I think the answer is yes?
If so, we'd probably be looking for an addition to the interface as follows:
type Trace struct {
...
AuthenticatePeer func(info AuthenticatePeerInfo) (traceCtx interface{})
AuthenticatedPeer func(traceCtx interface{}, info AuthenticatedPeerInfo)
AuthorizePeer func(info AuthorizePeerInfo) (traceCtx interface{})
AuthorizedPeer func(traceCtx interface{}, info AuthorizedPeerInfo)
}
type AuthenticatePeerInfo struct {
PeerCertificate []*x509.Certificate
}
type AuthenticatedPeerInfo struct {
VerifiedChains [][]*x509.Certificate
Err error
}
type AuthorizePeerInfo struct {
ID spiffeid.ID
VerifiedChains [][]*x509.Certificate
}
type AuthorizedPeerInfo struct {
ID spiffeid.ID
Err error
}
How does that feel?