census-instrumentation/opencensus-go

How to pass down tracing information in ServeHTTP to long-running tasks?

fatih opened this issue · 8 comments

fatih commented

Hi,

I have a process that is triggered via a webhook (ServeHTTP) that I wanted to trace. In my main package I use this simple setup:

func main() {
	handler = &ochttp.Handler{
		Propagation: &propagation.HTTPFormat{},
		Handler:     &MyHandler{},
	}

	srv := &http.Server{
		Addr:         "0.0.0.0:" + port,
		WriteTimeout: time.Second * 15,
		ReadTimeout:  time.Second * 15,
		IdleTimeout:  time.Second * 60,
		Handler:      handler,
	}

	srv.ListenAndServe()
}

And in my handler my setup is like this:

func (m *MyHandler) ServeHTTP(rw http.ResponseWriter, r *http.Request) {
	ctx := r.Context()

	// this foo() function takes some time to run and doesn't depend on writing
	// anything to http.ResponseWriter, so we can process it in the background,
	// it takes approximately around 3-15 seconds depending on the incoming
	// information.  I want to debug it via tracing, to see why it takes
	// sometimes 3 seconds, but in some cases 15 seconds :)

	// this doesn't work because ctx is cancelled immediately when I return :(
	// go foo(cxt)

	// I have to pass a new context, but then I lose the tracing information
	// injected from ochttp.Handler (i.e:
	// https://github.com/census-instrumentation/opencensus-go/blob/master/plugin/ochttp/server.go#L137)
	go foo(context.Background())
}

func foo(ctx context.Context) {
	ctx, span := trace.StartSpan(ctx, "foo")
	defer span.End()

	bar(ctx)
}

func bar(ctx context.Context) {
	ctx, span := trace.StartSpan(ctx, "bar")
	defer span.End()
}

The problem I have is, I can't pass down the r.Context(), because it's cancelled immediately when I return from ServeHTTP. According to the doc:

// For incoming server requests, the context is canceled when the
// client's connection closes, the request is canceled (with HTTP/2),
// or when the ServeHTTP method returns.
func (r *Request) Context() context.Context {
	if r.ctx != nil {
		return r.ctx
	}
	return context.Background()
}

I therefore can't associate the incoming request with my long process foo() and lose all the information that is injected via ochttp.Handler, see: https://github.com/census-instrumentation/opencensus-go/blob/master/plugin/ochttp/server.go#L137

I'm able to trace foo() and all the child functions associated with it when I use context.Background() (and therefore don't have any cancellation attached to it), but this means than that I can't associate anymore with the incoming request

I'm fairly new using tracing with context.Context and I'm wondering what the idiomatic way of solving this is? Feel free to close if this is the wrong place.

Thanks

Hello @fatih, great to catch you here, thank you for the question and welcome to the OpenCensus project!

So this problem can be handled in a couple of ways:
a) Using trace.FromContext to retrieve the span from r.Context()

spanA := trace.FromContext(ctx)
// Then use spanA

but that unfortunately adding children that might start after the parent has ended isn't great. Please read onto the next suggestion

b) Because the span is from the HTTP request i.e. the rest of the work will be processed asynchronously and due to Go runtime scheduling (hose goroutines could be scheduled even after the function has returned since there isn't explicit synchronization).
This would be a good usecase for a Span.Link as documented on the website https://opencensus.io/tracing/span/link/#uses-and-advisory and with code samples https://opencensus.io/tracing/span/link/#0

func handle(w http.ResponseWriter, r *http.Request) {
   ctx := r.Context()
   spawningSpan := trace.FromContext(ctx)

   go foo(context.Background(), spawningSpan)
}

func foo(ctx context.Context, spawningSpan *trace.Span) {
    ctx, span := trace.StartSpan(ctx, "foo")
   defer span.End()

   addAsLink(span, spawningSpan)
   bar(ctx)
}

func bar(ctx context.Context) {
    ctx, span := trace.StartSpan(ctx, "bar")
    defer span.End()
}

func addAsLink(subjectSpan, spawningSpan *trace.Span) bool {
   if spawningSpan == nil {
      return false
   }
   subjectSpan.AddLink(trace.Link{
        TraceID: spawningSpan.TraceID,
        SpanID:  spawningSpan.SpanID,
        Type:    trace.LinkTypeParent,
        Attributes: map[string]interface{}{
                "reason": "webhookSpan",
        },
   })
   return true
}

Which will allow you to later correlate traces. Some backends display links or at least allow you to click on a Span.Link but anyways, any asynchronous processing can be alleviated by using Span.AddLink.

I'll also page some more collaborators to voice some more ideas if they'd like @basvanbeek
@bogdandrutu @rakyll

Hope this helps and please feel free to ask more questions and submit any bugs.
Thank you and hope this helps.

fatih commented

Thanks @odeke-em for the great answer.

You said adding children that might start after the parent has ended isn't great and I totally agree with it. What if we create a new parent ourself in ServeHTTP? I also saw this link (golang/go#28279 (comment)), and someone suggested this form:

ctx = trace.NewContext(tag.NewContext(context.Background, tag.FromContext(ctx)), trace.FromContext(ctx))

I believe this would create a new Parent, or not ?

  1. I believe linking looks really useful, let me experiment with it.

So where RPC tracing is often commonly handled by the different tracing backends this is not the case for async / one way RPC / messaging.

As @odeke-em describes above you currently have two options. If your tracing backend (ex. Stackdriver) supports trace links this is a good option. If your backend doesn't (ex. Zipkin) you can go the extract/inject route as you mention.

What it does is basically copy the found tags and current span from the http context to your new context for foo. If you then create a new Span in Foo it will become a child of your http server span.

fatih commented

@basvanbeek I'm using StackDriver.

So do you mean, that

ctx = trace.NewContext(tag.NewContext(context.Background, tag.FromContext(ctx)), trace.FromContext(ctx))

will create a new Parent and then the span in foo() will become a child, without linking?

I'm not against linking btw, I just want to understand what's going on before I blindly go and implementing via linking :) So, linking is superior than creating a new parent and inject the values from the Grandparent ?

@basvanbeek I'm using StackDriver.

So do you mean, that

ctx = trace.NewContext(tag.NewContext(context.Background, tag.FromContext(ctx)), trace.FromContext(ctx))

will create a new Parent and then the span in foo() will become a child, without linking?

Let me rephrase for clarity. A parent is not created, it can become one if a new span refers to it as being its parent. This particular piece of code just copies over the tag values and the already existing span from one context to the other without changing anything with respect to your trace. The ochttp plugin will have created a server side RPC span and placed it into the request context of your handler. This span can be pulled out of context for explicit management by calling trace.FromContext(ctx) so for instance allowing you to add attributes (key/value pairs) to it if you desire. By pulling it out you can also inject it into a new context for further in process propagation.

What may help in understanding OpenCensus spans and its API with respect to context.Context is that we've made context.Context a first class citizen of the API. Most problems in tracing occur when in process span propagation is broken. So by adding context as input and output parameters to our span creation methods it will be much harder to break this propagation as long as you pass along the context object into your method calls.

I'm not against linking btw, I just want to understand what's going on before I blindly go and implementing via linking :) So, linking is superior than creating a new parent and inject the values from the Grandparent ?

Linking is an additional concept allowing for correlating relationships between different traces or within traces where spans do not follow parent-child relationship.

In your particular case I personally see no issue with a parent-child relationship as you expect lifetimes of parent and child to not follow RPC boundaries (child lifetime being within parent's lifetime). Visually speaking if your child will outlive its parent and potentially spawn after parent deceases it will show up nicely in the trace timeline and be quite clear as to what happened.

This does become an issue if your idea of a trace should not outlive root request/response and trace should be request/response scoped critical path only. Then links come into play.

fatih commented

Thanks a lot again @odeke-em and @basvanbeek. I successfully integrated OC into my project by pulling out the span and tags via the FromContext() functions and then passing it to my child functions. Everything works excellent so far.

@fatih can you add a short code snippet of what you did? I am very interesting in this topic.

fatih commented

@bogdandrutu sure thing. I've added this piece of snippet to my ServeHTTP

func (m *MyHandler) ServeHTTP(rw http.ResponseWriter, r *http.Request) {
	ctx := r.Context()


	// propagate tags to new context
	ctx = tag.NewContext(ctx, tag.FromContext(r.Context()))

	// pull span from context for adding github id attribute
	if span := trace.FromContext(r.Context()); span != nil {
		span.AddAttributes(
			trace.StringAttribute("github.delivery_id", github.DeliveryID(r)),
		)
		span.SetName("event")

		// propagate span to new context
		ctx = trace.NewContext(ctx, trace.FromContext(r.Context()))
	}

	go foo(cxt)
}

Note that @basvanbeek helped me. This all works really fine. The reason I'm using span.SetName("event") is because Stackdriver just displays incoming request as Recv./. When I add this they will be listed ad Recv.event.

I'm also using ocsql with success (enabled it for queries) and I can also see them in my timeline.

And this is how it looks like on my side:

trace